Top "Text-mining" questions

Text Mining is a process of deriving high-quality information from unstructured (textual) information.

Recognize PDF table using R

I'm trying to extract data from tables inside some pdf reports. I've seen some examples using either pdftools and similar …

r text-mining pdf-scraping
sentiment analysis - wordNet , sentiWordNet lexicon

I need a list of positive and negative words with the weights assigned to words according to how strong and …

nlp text-mining wordnet sentiment-analysis
How to use OpenNLP to get POS tags in R?

Here is the R Code: library(NLP) library(openNLP) tagPOS <- function(x, ...) { s <- as.String(x) …

r nlp text-mining opennlp pos-tagger
Row sum for large term-document matrix / simple_triplet_matrix ?? {tm package}

So I have a very large term-document matrix: > class(ph.DTM) [1] "TermDocumentMatrix" "simple_triplet_matrix" > ph.DTM A …

r text-mining
tm: read in data frame, keep text id's, construct DTM and join to other dataset

I'm using package tm. Say I have a data frame of 2 columns, 500 rows. The first column is ID which is …

r text-mining tm
How to calculate TF*IDF for a single new document to be classified?

I am using document-term vectors to represent a collection of document. I use TF*IDF to calculate the term weight …

machine-learning classification information-retrieval text-mining document-classification
Better text documents clustering than tf/idf and cosine similarity?

I'm trying to cluster the Twitter stream. I want to put each tweet to a cluster that talk about the …

machine-learning data-mining cluster-analysis text-mining
Removing overly common words (occur in more than 80% of the documents) in R

I am working with the 'tm' package in to create a corpus. I have done most of the preprocessing steps. …

r text-mining tm
Best clustering algorithm? (simply explained)

Imagine the following problem: You have a database containing about 20,000 texts in a table called "articles" You want to connect …

algorithm text cluster-analysis data-mining text-mining