python regex match optional square brackets

user740875 picture user740875 · Aug 26, 2014 · Viewed 17.8k times · Source

I have the following strings:

1 "R J BRUCE & OTHERS V B J & W L A EDWARDS And Ors CA CA19/02 27 February 2003",     
2 "H v DIRECTOR OF PROCEEDINGS [2014] NZHC 1031 [16 May 2014]",  
3 '''GREGORY LANCASTER AND JOHN HENRY HUNTER V CULLEN INVESTMENTS LIMITED AND  
ERIC JOHN WATSON CA CA51/03 26 May 2003''' 

I am trying to find a regular expression which matches all of them. I don't know how to match optional square brackets around the date at the end of the string eg [16 May 2014].

casename = re.compile(r'(^[A-Z][A-Za-z\'\(\) ]+\b[v|V]\b[A-Za-z\'\(\) ]+(.*?)[ \[ ]\d+    \w+ \d\d\d\d[\] ])', re.S) 

The date regex at the end only matches cases with dates in square bracket but not the ones without.

Thank to everybody who answered. @Matt Clarkson what I am trying to match is a judicial decision 'handle' in a much larger text. There is a large variation within those handles, but they all start at the beginning of a line have 'v' for versus between the party names and a date at the end. Mostly the names of the parties are in capital but not exclusively. I am trying to have only one match per document and no false positives.

Answer

RevanProdigalKnight picture RevanProdigalKnight · Aug 26, 2014

I got all of them to match using this (You'll need to add the case-insensitive flag):

(^[a-z][a-z\'&\(\) ]+\bv\b[a-z&\'\(\) ]+(?:.*?) \[?\d+ \w+ \d{4}\]?)

Regex Demo

Explanation:

  • ( Begin capture group
    • [a-z\'&\(\) ]+ Match one or more of the characters in this group
    • \b Match a word boundary
    • v Match the character 'v' literally
    • \b Match a word boundary
    • [a-z&\'\(\) ]+ Match one or more of the characters in this group
    • (?: Begin non-capturing group
      • .*? Match anything
    • ) End non-capturing group
    • \[?\d+ \w+ \d{4}\]? Match a date, optionally surrounded by brackets
  • ) End capture group