Search Engine in Java?

lana picture lana · Oct 28, 2011 · Viewed 20.4k times · Source
  1. I am trying to create a search engine just to learn and get more experience in Java.

    My intention is to store about 100 files on a server, a mixture of html, xml, doc, txt, and for each file to have meta data.

    SO when i search for a keyword, it should display a file with its meta description like Google.

    My question is, apart from html, can you add meta data to any other file formats, so that the meta description is shown.

  2. Would you be able to point me towards a Java search engine, that can search within file formats (txt,html) and display the result.

    I am working on my own code for this, but would like to have a look at other peoples code for some help?

Answer

Dave Newton picture Dave Newton · Oct 28, 2011

Lucene is the canonical Java search engine.

For adding documents from a variety of sources, take a look at Apache Tika and for a full-blown system with service/web interfaces, solr.

Lucene allows arbitrary metadata to be associated with its documents. Tika will automatically cull metadata from a variety of formats.