Removing stop words with tidytext

DIGSUM picture DIGSUM · Apr 16, 2017 · Viewed 12.9k times · Source

Using tidytext, I have this code:

data(stop_words)
tidy_documents <- tidy_documents %>%
      anti_join(stop_words)

I want it to use the stop words built into the package to write a dataframe called tidy_documents into a dataframe of the same name, but with the words removed if they are in stop_words.

I get this error:

Error: No common variables. Please specify by param. Traceback:

1. tidy_documents %>% anti_join(stop_words)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(expr, envir, enclos)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. anti_join(., stop_words)
10. anti_join.tbl_df(., stop_words)
11. common_by(by, x, y)
12. stop("No common variables. Please specify `by` param.", call. = FALSE)

Answer

Rohit picture Rohit · Oct 19, 2017

You can use the simpler filter() to avoid using the confusing anti_join() function like this:

tidy_documents <- tidy_documents %>%
  filter(!word %in% stop_words$word)