I'm doing a project for a college class I'm taking.
I'm using PHP to build a simple web app that classify tweets as "positive" (or happy) and "negative" (or sad) based on a set of dictionaries. The algorithm I'm thinking of right now is Naive Bayes classifier or decision tree.
However, I can't find any PHP library that helps me do some serious language processing. Python has NLTK (http://www.nltk.org). Is there anything like that for PHP?
I'm planning to use WEKA as the back end of the web app (by calling Weka in command line from within PHP), but it doesn't seem that efficient.
Do you have any idea what I should use for this project? Or should I just switch to Python?
Thanks
If you're going to be using a Naive Bayes classifier, you don't really need a whole ton of NL processing. All you'll need is an algorithm to stem the words in the tweets and if you want, remove stop words.
Stemming algorithms abound and aren't difficult to code. Removing stop words is just a matter of searching a hash map or something similar. I don't see a justification to switch your development platform to accomodate the NLTK, although it is a very nice tool.