Error faced while using TM package's VCorpus in R

Question 1

Error faced while using TM package's VCorpus in R

r text-mining tm text-analysis

Saharsh Gandhi · Nov 21, 2017 · Viewed 7.5k times · Source

Answer

Answer

I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

Details

A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

I solved it with the following code:

df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)

df_title <- data.frame(doc_id=row.names(df_cmp),
                       text=df_cmp$English.title)

You can try and change the column names to doc_id and text.

Question 2

I am facing the below error while working on the TM package with R.

library("tm")
Loading required package: NLP
Warning messages:
1: package ‘tm’ was built under R version 3.4.2 
2: package ‘NLP’ was built under R version 3.4.1

corpus <- VCorpus(DataframeSource(data))

Error: all(!is.na(match(c("doc_id", "text"), names(x)))) is not TRUE

Have tried various ways like reinstalling the package, updating with new version of R but the error still persists. For the same data file the same code runs on another system with the same version of R.

Error faced while using TM package's VCorpus in R

Answer

Related questions