Top "Information-retrieval" questions

Information Retrieval is an area of study concerning with retrieving documents, information or metadata from a collection of unstructured or semi-structured data.

Python: tf-idf-cosine: to find document similarity

I was following a tutorial which was available at Part 1 & Part 2. Unfortunately the author didn't have the time for …

python machine-learning nltk information-retrieval tf-idf
What is the best way to compute trending topics or tags?

Many sites offer some statistics like "The hottest topics in the last 24h". For example, Topix.com shows this in …

algorithm tags information-retrieval
Cosine similarity and tf-idf

I am confused by the following comment about TF-IDF and Cosine Similarity. I was reading up on both and then …

information-retrieval vsm cosine-similarity tf-idf
How to specify two Fields in Lucene QueryParser?

I read How to incorporate multiple fields in QueryParser? but i didn't get it. At the moment i have a …

java parsing lucene lucene.net information-retrieval
Reverse sort and argsort in python

I'm trying to write a function in Python (still a noob!) which returns indices and scores of documents ordered by …

python numpy scipy information-retrieval sparse-matrix
TF-IDF implementations in python

What are the standard tf-idf implementations/api available in python? I've come across the one in nltk. I want to …

python nltk information-retrieval tf-idf
Wikipedia text download

I am looking to download full Wikipedia text for my college project. Do I have to write my own spider …

text wikipedia web-crawler information-retrieval
What is the default list of stopwords used in Lucene's StopFilter?

Lucene have a default stopfilter (http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/core/StopFilter.html), does anyone …

java apache lucene information-retrieval stop-words
Java Open Source Text Mining Frameworks

I want to know what is the best open source Java based framework for Text Mining, to use botg Machine …

java frameworks machine-learning nlp information-retrieval
How to parse the data from Google Alerts?

Firstly, How would you get Google Alerts information into a database other than to parse the text of the email …

database information-retrieval google-alerts