Top "Information-retrieval" questions

Information Retrieval is an area of study concerning with retrieving documents, information or metadata from a collection of unstructured or semi-structured data.

How to build a simple inverted index?

I wanna build a simple indexing function of search engine without any API, such as Lucene. In the inverted index, …

indexing information-retrieval
How can I extract only the main textual content from an HTML page?

Update Boilerpipe appears to work really well, but I realized that I don't need only the main content because many …

java html information-retrieval jsoup
How to clear the cache in Solr?

I'm trying to compare the performance of different Solr queries. In order to get a fair test, I want to …

caching solr lucene information-retrieval
Why is log used when calculating term frequency weight and IDF, inverse document frequency?

The formula for IDF is log( N / df t ) instead of just N / df t. Where N = total documents in …

information-retrieval tf-idf
Good documentation on structure tcp_info

I am working on getting the performance parameters of a tcp connection and one these parameters is the bandwidth. I …

tcp connection for-loop information-retrieval
Get image height and width of image stored on Amazon S3

I plan to store images on Amazon S3 how to retrieve from Amazon S3 : 1)file size 2)image height 3)image width ?

image amazon-s3 information-retrieval
How to select stop words using tf-idf? (non english corpus)

I have managed to evaluate the tf-idf function for a given corpus. How can I find the stopwords and the …

information-retrieval text-mining stop-words tf-idf
How to evaluate a search/retrieval engine using trec_eval?

Is there any body who has used TREC_EVAL? I need a "Trec_EVAL for dummies". I'm trying to evaluate …

search-engine information-retrieval data-retrieval
Document search on partial words

I am looking for a document search engine (like Xapian, Whoosh, Lucene, Solr, Sphinx or others) which is capable of …

lucene solr information-retrieval xapian whoosh
Information retrieval (IR) vs data mining vs Machine Learning (ML)

People often throw around the terms IR, ML, and data mining, but I have noticed a lot of overlap between …

machine-learning data-mining information-retrieval