How do I calculate TF-IDF of a query?

Codarus picture Codarus · May 9, 2016 · Viewed 9.8k times · Source

How do I calculate tf-idf for a query? I understand how to calculate tf-idf for a set of documents with following definitions:

tf = occurances in document/ total words in document

idf = log(#documents / #documents where term occurs

But I don't understand how that correlates to queries.

For example, I read a resource that stated the values of a query "life learning"

life | tf = .5 | idf = 1.405507153 | tf_idf = 0.702753576
learning | tf = .5 | idf = 1.405507153 | tf_idf = 0.702753576

The tf values I understand, each term appears only once out of the two possible terms, thus 1/2, But I have no idea where the idf comes from.
I would think that #documents = 1 and occurrence = 1, log(1) = 0, so idf would be 0, but this doesn't seem to be the case. Is it based on whatever documents you're using? How do you calculate tf-idf for a query?

Answer

Amir picture Amir · Oct 18, 2017

Assume your query is best car insurance, your total vocabulary contains car, best, auto, insurance and you have N=1,000,000 documents. So your query is something like below:

enter image description here

And one of your document could be:

enter image description here

Now calculate cosine similarity between TF-IDF of your Query and Document.