what is the formula of sentiment calculation

Kanwal picture Kanwal · Nov 5, 2015 · Viewed 9.3k times · Source

what is the actual formula to compute sentiments using sentiment rated lexicon. the lexicon that I am using contains rating between the range -5 to 5. I want to compute sentiment for individual sentences. Either i have to compute average of all sentiment ranked words in sentence or only sum up them.

Answer

Ken Benoit picture Ken Benoit · Nov 5, 2015

There are several methods for computing an index from scored sentiment components of sentences. Each is based on comparing positive and negative words, and each has advantages and disadvantages.

For your scale, a measure of the central tendency of the words would be a fair measure, where the denominator is the number of scored words. This is a form of the "relative proportional difference" measure employed below. You would probably not want to divide the total sentiment words' scores by all words, since this makes each sentence's measure strongly affected by non-sentiment terms.

If you do not believe that the 11 point rating you describe is accurate, you could just classify it as positive or negative depending on its sign. Then you could apply the following methods where you have transformed where each P and N refer to the counts of the Positive and Negative coded sentiment words, and O is the count of all other words (so that the total number of words = P + N + O).

  1. Absolute Proportional Difference. Bounds: [0,1]

    Sentiment = (PN) / (P + N + O)

    Disadvantage: A sentence's score is affected by non-sentiment-related content.

  2. Relative Proportional Difference. Bounds: [-1, 1]

    Sentiment = (PN) / (P + N)

    Disadvantage: A sentence's score may tend to cluster very strongly near the scale endpoints (because they may contain content primarily or exclusively of either positive or negative).

  3. Logit scale. Bounds: [-infinity, +infinity]

    Sentiment = log(P + 0.5) - log(N + 0.5)

    This tends to have the smoothest properties and is symmetric around zero. The 0.5 is a smoother to prevent log(0).

For details, please see William Lowe, Kenneth Benoit, Slava Mikhaylov, and Michael Laver. (2011) "Scaling Policy Preferences From Coded Political Texts." Legislative Studies Quarterly 26(1, Feb): 123-155. where we compare their properties for measuring right-left ideology, but everything we discuss also applies to positive-negative sentiment.