Remove all text between two brackets

Michael Davidson picture Michael Davidson · May 31, 2014 · Viewed 18.3k times · Source

Suppose I have some text like this,

text<-c("[McCain]: We need tax policies that respect the wage earners and job creators. [Obama]: It's harder to save. It's harder to retire. [McCain]: The biggest problem with American healthcare system is that it costs too much. [Obama]: We will have a healthcare system, not a disease-care system. We have the chance to solve problems that we've been talking about... [Text on screen]: Senators McCain and Obama are talking about your healthcare and financial security. We need more than talk. [Obama]: ...year after year after year after year. [Announcer]: Call and make sure their talk turns into real solutions. AARP is responsible for the content of this advertising.")

and I would like to remove (edit: get rid of) all of the text between the [ and ] (and the brackets themselves). What's the best way to do this? Here is my feeble attempt using regex and the stingr package:

str_extract(text, "\\[[a-z]*\\]")

Thanks for any help!

Answer

zx81 picture zx81 · May 31, 2014

With this:

gsub("\\[[^\\]]*\\]", "", subject, perl=TRUE);

What the regex means:

  \[                       # '['
  [^\]]*                   # any character except: '\]' (0 or more
                           # times (matching the most amount possible))
  \]                       # ']'