Boosting by date field in solr is defined as:
{!boost b=recip(ms(NOW,datefield),3.16e-11,1,1)}
I looked everywhere (examples: Solr Dismax Config for Boost Scoring and Solr boost for multivalued date field and they all reference the SolrRelevancyFAQ), same definition that is used. But I found that this is not boosting my results sufficiently. How can I make this date boosting stronger?
User is searching for two keywords. Both items contain both keywords (in same order) in both title and description. Neither of the keywords is repeated.
And the solr debug output is waaay too confusing to me to understand the problem.
Now, this is not a huge problem. 99% of queries work fine and produce expected results, so its not like solr is not working at all, I just found this situation that is very confusing to me and don't know how to proceed.
recip(x, m, a, b) implements f(x) = a/(xm+b)
with :
x
: the document age in ms, defined as ms(NOW,<datefield>)
.
m
: a constant that defines a time scale which is used to apply boost. It should be relative to what you consider an old document age (a reference_time) in milliseconds. For example, choosing a reference_time of 1 year (3.16e10ms) implies to use its inverse : 3.16e-11
(1/3.16e10 rounded).
a
and b
are constants (defined arbitrarily).
xm = 1
when the document is 1 reference_time old (multiplier = a/(1+b)
).
xm ≈ 0
when the document is new, resulting in a value close to a/b
.
Using the same value for a and b ensures the multiplier doesn't exceed 1 with recent documents.
With a = b = 1
, a 1 reference_time old document has a multiplier of about 1/2, a 2 reference_time old document has a multiplier of about 1/3, and so on.
How to make a date boosting stronger ?
Increase m
: choose a lower reference_time for example 6 months, that gives us m = 6.33e-11
. Comparing to a 1 year reference, the multiplier decreases 2x faster as the document age increases.
Decreasing a
and b
expands the response curve of the function. This can be very agressive, see this example (page 8).
Apply a boost to the boost function itself with the bf
(Boost Functions) parameter (this is a dismax parameter so it requires using DisMax or eDisMax query parser), eg. :
bf=recip(ms(NOW,datefield),3.16e-11,1,1)^2.0
It is important to note a few things :
bf
is an additive boost and acts as a bonus added to the score of newer documents.
{!boost b}
is a multiplicative boost and acts more as a penalty applied to the score of older document.
A bf score (the "bonus" added to the global score) is calculated independently of the relevancy score (the global score), meaning that a resultset with higher scores may not be impacted as much as a resultset with lower scores. In contrast, multiplicative boosts affect scores the same way regardless of the resultset relevancy, that's why it is usually preferred.
Do not use recip()
for dates more than one reference_time in the future or it will yield negative values.
See also this very insightful post by Nolan Lawson on Comparing boost methods in Solr.