Regex (grep) for multi-line search needed

Ciaran Archer picture Ciaran Archer · Sep 15, 2010 · Viewed 217.7k times · Source

Possible Duplicate:
How can I search for a multiline pattern in a file ? Use pcregrep

I'm running a grep to find any *.sql file that has the word select followed by the word customerName followed by the word from. This select statement can span many lines and can contain tabs and newlines.

I've tried a few variations on the following:

$ grep -liIr --include="*.sql" --exclude-dir="\.svn*" --regexp="select[a-zA-Z0-
9+\n\r]*customerName[a-zA-Z0-9+\n\r]*from"

This, however, just runs forever. Can anyone help me with the correct syntax please?

Answer

albfan picture albfan · Aug 23, 2011

Without the need to install the grep variant pcregrep, you can do multiline search with grep.

$ grep -Pzo "(?s)^(\s*)\N*main.*?{.*?^\1}" *.c

Explanation:

-P activate perl-regexp for grep (a powerful extension of regular expressions)

-z suppress newline at the end of line, substituting it for null character. That is, grep knows where end of line is, but sees the input as one big line.

-o print only matching. Because we're using -z, the whole file is like a single big line, so if there is a match, the entire file would be printed; this way it won't do that.

In regexp:

(?s) activate PCRE_DOTALL, which means that . finds any character or newline

\N find anything except newline, even with PCRE_DOTALL activated

.*? find . in non-greedy mode, that is, stops as soon as possible.

^ find start of line

\1 backreference to the first group (\s*). This is a try to find the same indentation of method.

As you can imagine, this search prints the main method in a C (*.c) source file.