R-invalid multibyte string 1

Damien picture Damien · Nov 5, 2014 · Viewed 13.5k times · Source

I'm new to R software

Now,studying text mining using "tm"package"

I have a ploblem on mapping text to lower case

sms_raw<-read.csv(............)
sms_corpus<-Corpus(VectorSource(sms_raw$text)) 
sms_corpus<-Corpus(VectorSource(sms_raw$text))  
tm_map(sms_corpus,content_transformer(tolower))   
error:invalid multubytes string 1

I thought my csv file could be not utf-8 so I restored as utf-8 but it didn't work.

my OS is win8.1

Anyone have solution on this problem please let me know.

Answer

Damien picture Damien · Nov 7, 2014

The error I had easily solved by encoding function

In my file's column which name is text contains multibyte character

So I type

sms_raw$text <- iconv(enc2utf8(sms_raw$text),sub="byte")

This command converts the 'text' column (multibyte) to utf8 form