sentiwordnet scoring with python

pechdara picture pechdara · Jul 8, 2016 · Viewed 28.8k times · Source

I have been working on a research in relation with twitter sentiment analysis. I have a little knowledge on how to code on Python. Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.

Below is my code example:

import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)

Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged. thank in advance

Answer

Saravana Kumar picture Saravana Kumar · Jul 8, 2016

It's a little unclear as to what exactly your question is. Do you need a guide to using Sentiwordnet? If so check out this link,

http://www.nltk.org/howto/sentiwordnet.html

Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,

swn.senti_synset('breakdown.n.03')

Breaking down the argument,

  • 'breakdown' = word you need scores for.
  • 'n' = part of speech
  • '03' = Usage (01 for most common usage and a higher number would indicate lesser common usages)

So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.

Caveat: The POS tagger gives you a different tag than the one senti_synset accepts. Use the following to convert to synset notation.

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

(Credits to Using Sentiwordnet 3.0 for the above notation)

That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,

Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize. To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions. As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.

If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.