Top "Apache-tika" questions

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries.

Read Content from Files which are inside Zip file

I am trying to create a simple java program which reads and extracts the content from the file(s) inside …

java zip extract apache-tika
Indexing PDF with Solr

Can anyone point me to a tutorial. My main experience with Solr is indexing CSV files. But I cannot find …

solr full-text-search solrj apache-tika solr-cell
How do I index documents in SOLR?

Im running Solr 1.4 on Ubuntu 10.04 (installed via apt-get solr-tomcat) and it seems to be working fine. Im having some difficulty …

solr full-text-search apache-tika solr-cell
How to determine appropriate file extension from MIME Type in Java

I am uploading files to an Amazon s3 bucket and have access to the InputStream and a String containing the …

java amazon-s3 apache-tika
How to get file extension from content type?

I'm using Apache Tika, and I have files (without extension) of particular content type that need to be renamed to …

java content-type apache-tika
Getting MimeType subtype with Apache tika

I'd need to get the iana.org MediaType rather than application/zip or application/x-tika-msoffice for documents like, odt, ppt, …

java mime-types detection apache-tika
java.lang.IllegalArgumentException: protocol = http host = null

For this link http://bits.blogs.nytimes.com/2014/09/02/uber-banned-across-germany-by-frankfurt-court/?partner=rss&emc=rss this code doesn`t work but …

java url apache-tika
How to use Tika in server mode

On Tika's website it says (concerning tika-app-1.2.jar) it can be used in server mode. Does anyone know how to …

apache-tika
How can I use the HTML parser with Apache Tika in Java to extract all HTML tags?

I download tika-core and tika-parser libraries, but I could not find the example codes to parse HTML documents to string. …

java html apache apache-tika
Convert .docx to HTML using JAVA

I tried converting .doc to HTML by using WordToHtmlConverter and it worked perfectly. But when i tried to convert .docx …

java apache-tika