I trained the ridge classifier with a huge amount of data ,used tfidf vecotrizer
to vectorize data and it used to work fine. But now i am facing an error
'max_df corresponds to < documents than min_df'
The data is stored in Mongodb.
I tried various option to solve it and and finally when i deleted a collection in Mongodb which had only 1 document (1 record), it worked normally and completed the training as usual.
But I need a solution which does not require deleting the record as I need that record.
Also, I am not understanding the error as it is only in my machine.The script used to work fine before in my system even while this record was present in the db.The script is working fine in other system as well.
Could someone help please?
That error is telling you that your max_df
value is less than the min_df
value.
For example:
max_df = 0.7 # Removes terms with DF higher than the 70% of the documents
min_df = 5 # Terms must have DF >= 5 to be considered
and suppose that the total number of documents in your corpus is 7, so max_df
now is 0.7*7 = 4.9 and min_df
still is 5, then max_df < min_df
, and that should never happen because that means that 0 terms will be considered; never a term has DF lower than 4.9 and higher than 5.