I am trying to run an example provided in stack overflow which is here.
I have copied the code here again:
from sklearn.feature_extraction.text import TfidfVectorizer
text_files = ['file1.txt', 'file2.txt']
documents = [open(f) for f in text_files]
tfidf = TfidfVectorizer().fit_transform(documents)
# no need to normalize, since Vectorizer will return normalized tf-idf
pairwise_similarity = tfidf * tfidf.T
The only thing I added is this line:
text_files = ['file1.txt', 'file2.txt']
when I run the code I get this error:
File "C:\Python33\lib\site-packages\sklearn\feature_extraction\text.py", line 195, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: '_io.TextIOWrapper' object has no attribute 'lower'
the file1.txt
and file2.txt
are input text files. Am I using a wrong format for text_files
? what is the reason for this error and how can I fix that? I really appreciate any help on this.
open(f)
is a _io.TextIOWrapper
object, that's why it fails.
Try changing
documents = [open(f) for f in text_files]
to
documents = [open(f).read() for f in text_files]