Top "Text-mining" questions

Text Mining is a process of deriving high-quality information from unstructured (textual) information.

Are there APIs for text analysis/mining in Java?

I want to know if there is an API to do text analysis in Java. Something that can extract all …

java api nlp analysis text-mining
Text classification/categorization algorithm

My objective is to [semi]automatically assign texts to different categories. There's a set of user defined categories and a …

algorithm text-mining document-classification
R Text Mining: Counting the number of times a specific word appears in a corpus?

I have seen this question answered in other languages but not in R. [Specifically for R text mining] I have …

r count text-mining phrase
bigrams instead of single words in termdocument matrix using R and Rweka

I've found a way to use use bigrams instead of single tokens in a term-document matrix. The solution has been …

r text text-mining
How to select stop words using tf-idf? (non english corpus)

I have managed to evaluate the tf-idf function for a given corpus. How can I find the stopwords and the …

information-retrieval text-mining stop-words tf-idf
TermDocumentMatrix errors in R

I have been working through numerous online examples of the {tm} package in R, attempting to create a TermDocumentMatrix. Creating …

r text-mining tm corpus term-document-matrix
R-invalid multibyte string 1

I'm new to R software Now,studying text mining using "tm"package" I have a ploblem on mapping text to …

r utf-8 text-mining multibyte
How can i cluster document using k-means (Flann with python)?

I want to cluster documents based on similarity. I haved tried ssdeep (similarity hashing), very fast but i was told …

nlp cluster-analysis data-mining k-means text-mining
Emoticons in Twitter Sentiment Analysis in r

How do I handle/get rid of emoticons so that I can sort tweets for sentiment analysis? Getting: Error in …

r text-mining iconv sentiment-analysis
Counting words in a single document from corpus in R and putting it in dataframe

I have got text documents, in each document I have text featuring tv series spoilers. Each of the documents is …

r dataframe text-mining corpus