How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

Carl Carlson picture Carl Carlson · May 7, 2012 · Viewed 171.6k times · Source

So essentially I'm looking for specifically a 4 digit code within two angle brackets within a text file. I know that I need to open the text file and then parse line by line, but I am not sure the best way to go about structuring my code after checking "for line in file".

I think I can either somehow split it, strip it, or partition, but I also wrote a regex which I used compile on and so if that returns a match object I don't think I can use that with those string based operations. Also I'm not sure whether my regex is greedy enough or not...

I'd like to store all instances of those found hits as strings within either a tuple or a list.

Here is my regex:

regex = re.compile("(<(\d{4,5})>)?")

I don't think I need to include all that much code considering its fairly basic so far.

Answer

Eli Bendersky picture Eli Bendersky · May 7, 2012
import re
pattern = re.compile("<(\d{4,5})>")

for i, line in enumerate(open('test.txt')):
    for match in re.finditer(pattern, line):
        print 'Found on line %s: %s' % (i+1, match.group())

A couple of notes about the regex:

  • You don't need the ? at the end and the outer (...) if you don't want to match the number with the angle brackets, but only want the number itself
  • It matches either 4 or 5 digits between the angle brackets

Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.

More about regex in python can be found here : Regular Expression HOWTO