In Solr, can I sort on the matching value from a multi-valued field?

Michiel van Oosterhout picture Michiel van Oosterhout · Dec 27, 2013 · Viewed 8.7k times · Source

We are considering a schema with two multi-valued fields. Search is performed on the first field, but sorting should be done on the second field, using the corresponding value. E.g. if documents match because of the n-th value in the first field (where n may be different for each match), then they should be returned sorted by the n-th value in the second field.

Is that possible?

Background: each document has a list of similar documents (IDs) and a corresponding list of similarity scores (value between 0 and 1). Given ID 42, we need to return all similar documents (e.g. documents with 42 in the first field), sorted by their similarity to document 42.

Other schemas we are considering are:

  1. Dynamic fields for each ID so we can sort by the field Similarity_ID42 when searching for documents similar to 42. This does not seem to scale, at 800K+ documents, CPU goes to 100% during indexing.
  2. A single multi-valued field storing "ID.score" as a decimal (e.g. 42.563) and then searching for all documents that have a value that is > 42 AND < 43, and sorting by that value (I'm not even sure this is possible).

Answer

cheffe picture cheffe · Dec 30, 2013

The approach will not succeed, as you can search, but you cannot sort by a multivalued field. This pointed out in Sorting with Multivalued Field in Solr and written in Solr's Wiki

Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)

Update

About the alternatives, as you point out that you need to find similar documents for one given ID, why not create a second core with a schema like

<fields>
    <field name="doc_id" type="int" indexed="true" stored="true" />
    <field name="similar_to_id" type="int" indexed="true" stored="true" />
    <field name="similarity" type="string" indexed="true" stored="true" />
</fields>

<types>
    <fieldType name="int" class="solr.TrieIntField"/>
    <fieldType name="string" class="solr.StrField" />
</types>

Then you could do a second query, after performing the actual search

q=similar_to_id=42&sort=similarity