Select columns based on string match - dplyr::select

Timm S. picture Timm S. · Sep 19, 2014 · Viewed 99.9k times · Source

I have a data frame ("data") with lots and lots of columns. Some of the columns contain a certain string ("search_string").

How can I use dplyr::select() to give me a subset including only the columns that contain the string?

I tried:

# columns as boolean vector
select(data, grepl("search_string",colnames(data)))

# columns as vector of column names names 
select(data, colnames(data)[grepl("search_string",colnames(data))]) 

Neither of them work.

I know that select() accepts numeric vectors as substitute for columns e.g.:

select(data,5,7,9:20)

But I don't know how to get a numeric vector of columns IDs from my grepl() expression.

Answer

joran picture joran · Sep 19, 2014

Within the dplyr world, try:

select(iris,contains("Sepal"))

See the Selection section in ?select for numerous other helpers like starts_with, ends_with, etc.