Top "Corpus" questions

A corpus most commonly refers to a collection of structured text.

Find all locations / cities / places in a text

If I have a text containing for example an article of a newspaper in Catalan language, how could I find …

python nltk corpus text-analysis tagged-corpus
R text mining documents from CSV file (one row per doc)

I am trying to work with the tm package in R, and have a CSV file of customer feedback with …

r text-mining documents corpus tm
Is there any Treebank for free?

Is any place I can download Treebank of English phrases for free or less than $100? I need training data containing …

nlp tagging corpus
R's tm package for word count

I have a corpus with over 5000 text files. I would like to get individual word counts for each file after …

r word-count tm corpus text-analysis
Classification using movie review corpus in NLTK/Python

I'm looking to do some classification in the vein of NLTK Chapter 6. The book seems to skip a step in …

python nlp nltk sentiment-analysis corpus
Unable to convert a Corpus to Data Frame in R

I've looked at the other similar questions that have been posted here (like this), but the problem persists. I have …

r text-mining tm corpus
TermDocumentMatrix errors in R

I have been working through numerous online examples of the {tm} package in R, attempting to create a TermDocumentMatrix. Creating …

r text-mining tm corpus term-document-matrix
Counting words in a single document from corpus in R and putting it in dataframe

I have got text documents, in each document I have text featuring tv series spoilers. Each of the documents is …

r dataframe text-mining corpus
More efficient means of creating a corpus and DTM with 4M rows

My file has over 4M rows and I need a more efficient way of converting my data to a corpus …

r data.table corpus term-document-matrix qdap
Free Tagged Corpus for Named Entity Recognition

I am looking for a free tagged corpus for a system to train on to for Named Entity Recognition. Most …

nltk corpus named-entity-recognition tagged-corpus