I need to build a POS tagger in Java and need to know how to get started. Are there code examples or other resources that help illustrate how POS taggers work?
Try Apache OpenNLP. It includes a POS Tagger tools. You can download ready-to-use English models from here.
The documentation provides details about how to use it from a Java application. Basically you need the following:
Load the POS model
InputStream modelIn = null;
try {
modelIn = new FileInputStream("en-pos-maxent.bin");
POSModel model = new POSModel(modelIn);
}
catch (IOException e) {
// Model loading failed, handle the error
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
Instantiate the POS tagger
POSTaggerME tagger = new POSTaggerME(model);
Execute it
String sent[] = new String[]{"Most", "large", "cities", "in", "the", "US", "had", "morning", "and", "afternoon", "newspapers", "."};
String tags[] = tagger.tag(sent);
Note that the POS tagger expects a tokenized sentence. Apache OpenNLP also provides tools and models to help with these tasks.
If you have to train your own model refer to this documentation.