Matching a multiple lines pattern via PHP's preg_match()

Dmitriy Ryabinin picture Dmitriy Ryabinin · Jan 22, 2012 · Viewed 84.1k times · Source

How can I match subject via a PHP preg_match() regular expression pattern in this HTML code:

      <table border=0>
  <tr>
  <td>


  <h2>subject</h2>



    </td>

All the whitespaces and newlines are left on purpose. So the problem is in extracting subject name using some multiple line pattern.

Answer

mathematical.coffee picture mathematical.coffee · Jan 22, 2012

If you're looking for (e.g.) a h2 tag nested within a td tag where there's only whitespace in between the two, just use \s which includes spaces, newlines, etc. eg::

preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#i',$str,$matches);
// result is in $matches[1]

See it in action here.

For your interest, here is a list of different modifiers you can pass in to preg_* functions. Flags that may interest you are:

  • s ("dotall") : this one makes . match every character, including newlines. So, say your <h2>.....</h2> was spread over multiple lines. Then you'd have to do

    preg_match('#<td>\s*<h2>(.*?)</h2>\s*</td>#is',$str,$matches);
    

    in order to have the .* go over multiple lines (see the extra s at the end of the regex?).

  • m ("multiline") : this one just lets ^ and $ match start/end of line instead of just the start/end of string. You only really need it if you're using ^ and $ in your pattern and want them to match the start/end of each individual line in your input.