Finding Acronyms Using Regex In Python

jmulmer picture jmulmer · Jul 22, 2013 · Viewed 7.3k times · Source

I'm trying to use regex in Python to match acronyms separated by periods. I have the following code:

import re
test_string = "U.S.A."
pattern = r'([A-Z]\.)+'
print re.findall(pattern, test_string)

The result of this is:

['A.']

I'm confused as to why this is the result. I know + is greedy, but why is are the first occurrences of [A-Z]\. ignored?

Answer

Ro Yo Mi picture Ro Yo Mi · Jul 22, 2013

Description

This regex will:

  • capture all the acronyms like U.S.A. in a sentence
  • avoids matching uppercase words at the end of a sentence

(?:(?<=\.|\s)[A-Z]\.)+

enter image description here

Example

Live Example: http://www.rubular.com/r/9bslFxvfzQ

Sample Text

This is the U.S.A. we have RADAR.

Matches

U.S.A