I found the following code on the internet for calculating TFIDF:
https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py
I added "1+" in the function def idf(word, documentList) so i won't get divided by 0 error:
return math.log(len(documentList) / (1 + float(numDocsContaining(word,documentList))))
But i am confused for two things:
Code:
documentNumber = 0
for word in documentList[documentNumber].split(None):
words[word] = tfidf(word,documentList[documentNumber],documentList)
Should TFIDF be calculated on the first document only?