how are sentiment analysis computed in blob

MarieJ picture MarieJ · Dec 29, 2015 · Viewed 8.3k times · Source

I use the following to compute the sentiment of 200 short sentences. I did not use a training data set:

for sentence in textblob.sentences: print(sentence.sentiment)

The analysis returns two values: polarity and subjectivity. From what I read online, the polarity score is a float within the range [-1.0, 1.0] where 0 indicates neutral, +1 a very positive attitude and -1 a very negative attitude. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

So, now my question: How are those scores computed?

I have some zeros for the polarity score of almost half of the phrases and I am wondering whether the zero indicates neutrality or rather the fact that the phrase does not feature words that have a polarity. I am wondering the same question for another sentiment analyser:NaiveBayesAnalyzer.

Thank you for your help!
Marie

Answer

Luke picture Luke · Dec 29, 2015

The TextBlob NaiveBayesAnalyzer is apparently based on the Stanford NLTK. The Naive Bayes algorithm in general is explained here: A simple explanation of Naive Bayes Classification

and its application to sentiment and objectivity is described here: http://nlp.stanford.edu/courses/cs224n/2009/fp/24.pdf

Basically you're right that certain words will be labeled something like "40% positive / 60% negative" based on how they were used in some body of training data (for the Stanford NLTK, the training data was movie reviews). Then the scores of all words in your sentence get multiplied to produce the sentence score.

I haven't tested, but I expect that if the library returns exactly 0.0, then your sentence didn't contain any words that had a polarity in the NLTK training set. I suspect the researchers didn't include them because 1) they were too rare in the training data or 2) they were known to be meaningless (such as "the", "a", "and", etc.).

That goes for the Naive Bayes analyzer. Regarding the PatternAnalyzer, the TextBlob docs say it's based on the "pattern" library, but it doesn't seem to document how it works. I suspect something similar is happening though.