Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

LG3555 picture LG3555 · Aug 12, 2019 · Viewed 7.4k times · Source

Using the tidytext package, I want to transform my tibble into a one-token-per-document-per-row. I transformed the text column of my tibble from factor to character but I still get the same error.

text_df <- tibble(line = 1:3069, text = text)

My tibble looks like this, with a column as character:

# A tibble: 3,069 x 2
line text$text  
<int> <chr> 

However when I try to apply unnest_tokens:

text_df %>%
  unnest_tokens(word, text$text)

I always get the same error:

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1.

What is the issue in my code?

PS: I've looked at different posts on the topic but no luck.

Thank you

Answer

shs picture shs · Aug 18, 2019

At least part of the problem is the variable name containing a "$". What your are effectively doing in your code is trying to get the element "text" from the object "text", which is likely the function graphics::text and not subsetable.

Change the name of "text$text" or wrap it in backticks:

text_df %>% 
   unnest_tokens(word, `text$text`)

In general you should avoid using special characters in variable names, because it only leads to errors like this one.

If your problem persists, please provide a minimal reproducible example: How to make a great R reproducible example