Top "Corpus" questions

A corpus most commonly refers to a collection of structured text.

What is the difference between corpus and lexicon in NLTK (python)

Can someone tell me the difference between a Corpora ,corpus and lexicon in NLTK ? What is the movie data set ? …

machine-learning nlp nltk corpus lexical
create a Corpus from many html files in R

I would like to create a Corpus for the collection of downloaded HTML files, and then read them in R …

html r xml-parsing text-mining corpus
R Corpus Is Messing Up My UTF-8 Encoded Text

I am simply trying to create a corpus from Russian, UTF-8 encoded text. The problem is, the Corpus method from …

r encoding utf-8 tm corpus
R: find most frequent group of words in corpus

Is there an easy way how to find not only most frequent terms, but also expressions (so more than one …

tm corpus word-frequency
NLTK - how to find out what corpora are installed from within python?

I'm trying to load some corpora I installed with the NLTK installer but I got a: >>> from …

python nlp nltk corpus
Keep document ID with R corpus

I have searched stackoverflow and the web and can only find partial solutions OR some that don't work due to …

r text text-mining tm corpus
what's the meaning of the categories in the corpus reuters of NLTK

I suffered from problems, when doing text topic classification. I got the data in NLTK "reuters" corpus.. However when I …

python nlp nltk corpus
How do I count words in an nltk plaintextcorpus faster?

I have a set of documents, and I want to return a list of tuples where each tuple has the …

python nlp nltk corpus