On Tika's website it says (concerning tika-app-1.2.jar) it can be used in server mode. Does anyone know how to send documents and receive parsed text from this server once it is running?
Tika supports two "server" modes. The simpler and original is the --server
flag of Tika-App. The more functional, but also more recent is the JAX-RS JSR-311 server component, which is an additional jar.
The Tika-App Network Server is very simple to use. Simply start Tika-App with the --server
flag, and a --port ###
flag telling it what port to listen on. Then, connect to that port and send it a single file. You'll get back the html version. NetCat works well for this, something like java -jar tika-app.jar --server --port 12345
followed by nc 127.0.0.1 12345 < MyFileToExtract
will get you back the html
The JAX-RS JSR-311 server component supports a few different urls, for things like metadata, plain text etc. You start the server with java -jar tika-server.jar
, then do HTTP put calls to the appropriate url with your input document and you'll get the resource back. There are loads of details and examples (including using curl for testing) on the wiki page
The Tika App Network Server is fairly simple, only supports one mode (extract to HTML), and is generally used for testing / demos / prototyping / etc. The Tika JAXRS Server is a fully RESTful service which talks HTTP, and exposes a wide range of Tika's modes. It's the generally recommended way these days to interface with Tika over the network, and/or from non-Java stacks.