Negation handling in NLP

Tim Daubenschütz picture Tim Daubenschütz · Feb 25, 2015 · Viewed 8.9k times · Source

I'm currently working on a project, where I want to extract emotion from text. As I'm using conceptnet5 (a semantic network), I can't however simply prefix words in a sentence that contains a negation-word, as those words would simply not show up in conceptnet5's API.

Here's an example:

The movie wasn't that good.

Hence, I figured that I could use wordnet's lemma functionality to replace adjectives in sentences that contain negation-words like (not, ...).

In the previous example, the algorithm would detect wasn't and would replace it with was not. Further, it would detect a negation-word not, and replace good with it's antonym bad. The sentence would read:

The movie was that bad.

While I see that this isn't the most elegant way, and it does probably in many cases produce the wrong result, I'd still like to handle negation that way as I frankly don't know any better approach.

Considering my problem: Unfortunately, I did not find any library that would allow me to replace all occurrences of appended negation-words (wasn't => was not). I mean I could do it manually, by replacing the occurrences with a regex, but then I would be stuck with the english language.

Therefore I'd like to ask if some of you know a library, function or better method that could help me here. Currently I'm using python nltk, still it doesn't seem that it contains such functionality, but I may be wrong.

Thanks in advance :)

Answer

Nikita Astrakhantsev picture Nikita Astrakhantsev · Feb 25, 2015

Cases like wasn't can be simply parsed by tokenization (tokens = nltk.word_tokenize(sentence)): wasn't will turn into was and n't.

But negative meaning can also be formed by 'Quasi negative words, like hardly, barely, seldom' and 'Implied negatives, such as fail, prevent, reluctant, deny, absent', look into this paper. Even more detailed analysis can be found in Christopher Potts' On the negativity of negation .

Considering your initial problem, sentiment analysis, most modern approaches, as far as I know, don't process negations explicitly; instead, they use supervised approaches with high-order n-grams. Those actually processing negation usually append special prefix NOT_ to all words between negation and punctuation marks.