How to backreference in Ruby regular expression (regex) with gsub when I use grouping?

Konstantin picture Konstantin · Aug 22, 2012 · Viewed 17.4k times · Source

I would like to patch some text data extracted from web pages. sample:

t="First sentence. Second sentence.Third sentence."

There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document.

I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex:

t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)

But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.

Answer

Joshua Cheek picture Joshua Cheek · Aug 22, 2012

You can backreference in the substitution string with \1 (to match capture group 1).

t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2") # => "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."