I am current in planning on creating a big database (2+ million rows) with a variety of data from separate sources. I would like to avoid structuring the database around auto_increment ids to help prevent against sync issues with replication, and also because each item inserted will have a alphanumeric product code that is guaranteed to be unique - it seems to me more sense to use that instead.
I am looking at a search engine to index this database with Sphinx looking rather appealing due to its design around indexing relational databases. However, looking at various tutorials and documentation seems to show database designs being dependent on an auto_increment field in one form or another and a rather bold statement in the documentation saying that document ids must be 32/64bit integers only or things break.
Is there a way to have a database indexed by Sphinx without auto_increment fields as the id?
Sure - that's easy to work around. If you need to make up your own IDs just for Sphinx and you don't want them to collide, you can do something like this in your sphinx.conf (example code for MySQL)
source products {
# Use a variable to store a throwaway ID value
sql_query_pre = SELECT @id := 0
# Keep incrementing the throwaway ID.
# "code" is present twice because Sphinx does not full-text index attributes
sql_query = SELECT @id := @id + 1, code AS code_attr, code, description FROM products
# Return the code so that your app will know which records were matched
# this will only work in Sphinx 0.9.10 and higher!
sql_attr_string = code_attr
}
The only problem is that you still need a way to know what records were matched by your search. Sphinx will return the id (which is now meaningless) plus any columns that you mark as "attributes".
Sphinx 0.9.10 and above will be able to return your product code to you as part of the search results because it has string attributes support.
0.9.10 is not an official release yet but it is looking great. It looks like Zawodny is running it over at Craig's List so I wouldn't be too nervous about relying on this feature.