using sphinx search with mongodb as datasource

mongodb sphinx

aurora · Nov 5, 2009 · Viewed 14.7k times · Source

We decided to use mongodb for some web application (instead of mysql) but want to stay with sphinx for indexing/searching all data stored in mongodb. as the mongodb object-id is a hash per default -- and we want to stay with this -- now there's one problem in using sphinx. As it says in the sphinx documentation:

ALL DOCUMENT IDS MUST BE UNIQUE UNSIGNED NON-ZERO INTEGER NUMBERS (32-BIT OR 64-BIT, DEPENDING ON BUILD TIME SETTINGS).

so ... what's the best way to solve this problem ... how can we map the mongodb object-id to a non-zero integer (and back)?

UPDATE

casey's answer is the right direction to look into, however at it turns out string attributes are in the current dev-version only available for the sql datasource. for xmlpipe it's necessary to apply a patch to the checkout source. more information on this can be found in the sphinx forum.

Answer

You can't use the object id as a Sphinx document id - MongoDB object IDs are bigger than the maximum size of Sphinx's document IDs.

Instead, you could increment a unique ID while generating the XML that Sphinx is going to process (I'm assuming you are using xmlpipe to get your Mongo data into Sphinx?) and store the MongoDB object ID as a string attribute in Sphinx.

You'll need the latest development version of Sphinx to do this - see my answer to this question for a little more detail: Sphinx without using an auto_increment id

using sphinx search with mongodb as datasource

Answer

Related questions