Solr commit and optimize questions

user188962 picture user188962 · Jan 26, 2010 · Viewed 52k times · Source

I have a classifieds website. Users may put ads, edit ads, view ads etc.

Whenever a user puts an ad, I am adding a document to Solr. I don't know, however, when to commit it. Commit slows things down from what I have read.

How should I do it? Autocommit every 12 hours or so?

Also, how should I do it with optimize?

Answer

James Roland picture James Roland · Sep 17, 2010

A little more detail on Commit/Optimize:

Commit: When you are indexing documents to solr none of the changes you are making will appear until you run the commit command. So timing when to run the commit command really depends on the speed at which you want the changes to appear on your site through the search engine. However it is a heavy operation and so should be done in batches not after every update.

Optimize: This is similar to a defrag command on a hard drive. It will reorganize the index into segments (increasing search speed) and remove any deleted (replaced) documents. Solr is a read only data store so every time you index a document it will mark the old document as deleted and then create a brand new document to replace the deleted one. Optimize will remove these deleted documents. You can see the search document vs. deleted document count by going to the Solr Statistics page and looking at the numDocs vs. maxDocs numbers. The difference between the two numbers is the amount of deleted (non-search able) documents in the index.

Also Optimize builds a whole NEW index from the old one and then switches to the new index when complete. Therefore the command requires double the space to perform the action. So you will need to make sure that the size of your index does not exceed %50 of your available hard drive space. (This is a rule of thumb, it usually needs less then %50 because of deleted documents)

Index Server / Search Server: Paul Brown was right in that the best design for solr is to have a server dedicated and tuned to indexing, and then replicate the changes to the searching servers. You can tune the index server to have multiple index end points.

eg: http://solrindex01/index1; http://solrindex01/index2

And since the index server is not searching for content you can have it set up with different memory footprints and index warming commands etc.

Hope this is useful info for everyone.