FUN-error after running 'tolower' while making Twitter wordcloud

Anne Boysen picture Anne Boysen · Jan 3, 2015 · Viewed 16.2k times · Source

Trying to create wordcloud from twitter data, but get the following error:

Error in FUN(X[[72L]], ...) : 
  invalid input '������������❤������������ "@xxx:bla, bla, bla... http://t.co/56Fb78aTSC"' in 'utf8towcs' 

This error appears after running the "mytwittersearch_corpus<- tm_map(mytwittersearch_corpus, tolower)" code

mytwittersearch_list <-sapply(mytwittersearch, function(x) x$getText())

mytwittersearch_corpus <-Corpus(VectorSource(mytwittersearch_corpus_list))
mytwittersearch_corpus<-tm_map(mytwittersearch_corpus, tolower)
mytwittersearch_corpus<-tm_map( mytwittersearch_corpus, removePunctuation)
mytwittersearch_corpus <-tm_map(mytwittersearch_corpus, function(x) removeWords(x, stopwords()))

I read on other pages this may be due to R having difficulty processing symbols, emoticons and letters in non-English languages, but this appears not to be the problem with the "error tweets" that R has issues with. I did run the codes:

mytwittersearch_corpus <- tm_map(mytwittersearch_corpus, function(x) iconv(enc2utf8(x), sub = "byte"))
mytwittersearch_corpus<- tm_map(mytwittersearch_corpus, content_transformer(function(x)    iconv(enc2utf8(x), sub = "bytes")))

These do not help. I also get that it can't find function content_transformer even though the tm-package is checked off and running.

I'm running this on OS X 10.6.8 and using the latest RStudio.

Answer

RUser picture RUser · Jan 4, 2015

I use this code to get rid of the problem characters:

tweets$text <- sapply(tweets$text,function(row) iconv(row, "latin1", "ASCII", sub=""))