The Stanford NLP, demo'd here, gives an output like this:
Colorless/JJ green/JJ ideas/NNS sleep/VBP furiously/RB ./.
What do the Part of Speech tags mean? I am unable to find an official list. Is it Stanford's own system, or are they using universal tags? (What is JJ
, for instance?)
Also, when I am iterating through the sentences, looking for nouns, for instance, I end up doing something like checking to see if the tag .contains('N')
. This feels pretty weak. Is there a better way to programmatically search for a certain part of speech?
The Penn Treebank Project. Look at the Part-of-speech tagging ps.
JJ is adjective. NNS is noun, plural. VBP is verb present tense. RB is adverb.
That's for english. For chinese, it's the Penn Chinese Treebank. And for german it's the NEGRA corpus.
- CC Coordinating conjunction
- CD Cardinal number
- DT Determiner
- EX Existential there
- FW Foreign word
- IN Preposition or subordinating conjunction
- JJ Adjective
- JJR Adjective, comparative
- JJS Adjective, superlative
- LS List item marker
- MD Modal
- NN Noun, singular or mass
- NNS Noun, plural
- NNP Proper noun, singular
- NNPS Proper noun, plural
- PDT Predeterminer
- POS Possessive ending
- PRP Personal pronoun
- PRP$ Possessive pronoun
- RB Adverb
- RBR Adverb, comparative
- RBS Adverb, superlative
- RP Particle
- SYM Symbol
- TO to
- UH Interjection
- VB Verb, base form
- VBD Verb, past tense
- VBG Verb, gerund or present participle
- VBN Verb, past participle
- VBP Verb, non3rd person singular present
- VBZ Verb, 3rd person singular present
- WDT Whdeterminer
- WP Whpronoun
- WP$ Possessive whpronoun
- WRB Whadverb