Java or Python for Natural Language Processing

Jin Ling picture Jin Ling · Apr 7, 2014 · Viewed 66.8k times · Source

I would like to know which programming language is better for natural language processing. Java or Python? I have found lots of questions and answers regarding about it. But I am still lost in choosing which one to use.

And I want to know which NLP library to use for Java since there are lots of libraries (LingPipe, GATE, OpenNLP, StandfordNLP). For Python, most programmers recommend NLTK.

But if I am to do some text processing or information extraction from unstructured data (just free formed plain English text) to get some useful information, what is the best option? Java or Python? Suitable library?

Updated

What I want to do is to extract useful product information from unstructured data (E.g. users make different forms of advertisement about mobiles or laptops with not very standard English language)

Answer

alvas picture alvas · Apr 7, 2014

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you'll need to use one or the other and often there isn't much of a choice unless you're heading a project.

Other than NLTK (www.nltk.org), there are actually other libraries for text processing in python:

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

For Java, there're tonnes of others but here's another list:

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

If you're uncertain, which is the language to go for NLP, personally i say, "any language that will give you the desired analysis/output", see Which language or tools to learn for natural language processing?

Here's a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp


Other than language processing tools, you would very much need machine learning tools to incorporate into NLP pipelines.

There's a whole range in Python and Java, and once again it's up to preference and whether the libraries are user-friendly enough:

Machine Learning libraries in python:

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)


With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

I'll avoid listing deep learning tools out of non-favoritism / neutrality.


Other Stackoverflow questions that also asked for NLP/ML tools: