split string with regex

Jeff Keller picture Jeff Keller · Mar 22, 2013 · Viewed 14k times · Source

I'm looking to split a string of a generic form, where the square brackets denote the "sections" of the string. Ex:

x <- "[a] + [bc] + 1"

And return a character vector that looks like:

"[a]"  " + "  "[bc]" " + 1"

EDIT: Ended up using this:

x <- "[a] + [bc] + 1"
x <- gsub("\\[",",[",x)
x <- gsub("\\]","],",x)
strsplit(x,",")

Answer

IRTFM picture IRTFM · Mar 22, 2013

I've seen TylerRinker's code and suspect it may be more clear than this but this may serve as way to learn a different set of functions. (I liked his better before I noticed that it split on spaces.) I tried adapting this to work with strsplit but that function always removes the separators. Maybe this could be adapted to make a newstrsplit that splits at the separators but leaves them in? Probably need to not split at first or last position and distinguish between opening and closing separators.

scan(text=   # use scan to separate after insertion of commas
            gsub("\\]", "],",   # put commas in after "]"'s
            gsub(".\\[", ",[",  x)) ,  # add commas before "[" unless at first position
        what="", sep=",")    # tell scan this character argument and separators are ","
#Read 4 items
#[1] "[a]"  " +"   "[bc]" " + 1"