Snowball Stemmer Usage

Lemonio picture Lemonio · Jul 30, 2013 · Viewed 10.3k times · Source

I'd like to use the stemmer here for merging word counts.
http://snowball.tartarus.org/download.html
The page has a download link, but I'm not sure how to integrate the files into my eclipse project
Its not just a jar to drop into my lib folder, its a file system. Does anyone know of some documentation explaining this, as I didn't see any on the website.
(As in, what do i import, how do I call it etc..)

Answer

mjaque picture mjaque · Apr 27, 2014

Build the jar file and add it to your Build Path.

Details:

  • Download the tgz with the code from here http://snowball.tartarus.org/download.php
  • Uncompress.
  • Go to libstemmer_java directory and read README.
  • Follow instructions to compile (using javac).
  • You might have to correct or remove java/org/tartarus/snowball/ext/frenchStemmer.java because it has an error and doesn't compile.
  • Create jar file: Go to libstemmer_java/java directory then jar cvf libstemmer.jar *
  • Add libstemmer.jar to your Build Path (in Eclipse: Project-Properties-Java Build Path-Libreries Tab).

Then you can use the stemmers doing something like:

import org.tartarus.snowball.ext.spanishStemmer;
...
spanishStemmer stemmer = new spanishStemmer();
stemmer.setCurrent("torero");
if (stemmer.stem()){
    System.out.println(stemmer.getCurrent());
}