I have a corpus with over 5000 text files. I would like to get individual word counts for each file after running pre-processing each (turning to lower, removing stopwords, etc). I haven't had any luck with the word count for the individual text files. Any help would be appreciated.
library(tm)
revs<-Corpus(DirSource("data/"))
revs<-tm_map(revs,tolower)
revs<-tm_map(revs,removeWords, stopwords("english"))
revs<-tm_map(revs,removePunctuation)
revs<-tm_map(revs,removeNumbers)
revs<-tm_map(revs,stripWhitespace)
dtm<-DocumentTermMatrix(revs)
As Tyler notes, your question is incomplete without a reproducible example. Here's how to make a reproducible example for this kind of question - use the data that comes built-in with the package:
library("tm") # version 0.6, you seem to be using an older version
data(crude)
revs <- tm_map(crude, content_transformer(tolower))
revs <- tm_map(revs, removeWords, stopwords("english"))
revs <- tm_map(revs, removePunctuation)
revs <- tm_map(revs, removeNumbers)
revs <- tm_map(revs, stripWhitespace)
dtm <- DocumentTermMatrix(revs)
And here's how to get a word count per document, each row of the dtm is one document, so you simply sum the columns for a row and you have the word count for the document:
# Word count per document
rowSums(as.matrix(dtm))