this page: http://scikit-learn.org/stable/modules/feature_extraction.html mentions:
As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.
then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?
I tried:
In [39]: vectorizer.idf_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-5475eefe04c0> in <module>()
----> 1 vectorizer.idf_
AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'
but this attribute is missing.
Thanks
Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_
of the TfidfVectorizer
object:
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
"This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
Output:
{u'is': 1.0,
u'nice': 1.4054651081081644,
u'strange': 1.4054651081081644,
u'this': 1.0,
u'very': 1.0}
As discussed in the comments, prior to version 0.15, a workaround is to access the attribute idf_
via the supposedly hidden _tfidf
(an instance of TfidfTransformer
) of the vectorizer:
idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))
which should give the same output as above.