I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby?
Similar to Is there a good natural language processing library but for Ruby. I'd prefer something very general, but any leads are appreciated!
Three excellent and mature NLP packages are Stanford Core NLP, Open NLP and LingPipe. There are Ruby bindings to the Stanford Core NLP tools (GPL license) as well as the OpenNLP tools (Apache License).
On the more experimental side of things, I maintain a Text Retrieval, Extraction and Annotation Toolkit (Treat), released under the GPL, that provides a common API for almost every NLP-related gem that exists for Ruby. The following list of Treat's features can also serve as a good reference in terms of stable natural language processing gems compatible with Ruby 1.9.
punkt-segmenter
, tactful_tokenizer
, srx-english
, scalpel
)stanford-core-nlp
).linguistics
), stemming (ruby-stemmer
, uea-stemmer
, lingua
, etc.)rwordnet
), POS taggers (rbtagger
, engtagger
, etc.)whatlanguage
), date/time (chronic
, kronic
, nickel
), keyword (lda-ruby
) extraction.ferret
).stanford-core-nlp
).decisiontree
), MLPs (ruby-fann
), SVMs (rb-libsvm
) and linear classification (tomz-liblinear-ruby-swig
).levenshtein-ffi
, fuzzy-string-match
, tf-idf-similarity
).Not included in Treat, but relevant to NLP: hotwater (string distance algorithms), yomu (binders to Apache Tiki for reading .doc, .docx, .pages, .odt, .rtf, .pdf), graph-rank (an implementation of GraphRank).