regex street address match

isuelt picture isuelt · Feb 22, 2012 · Viewed 35.5k times · Source

While I know that matching a street address will never be perfect I'm looking to create a couple of regex statements that will get close most of the time.

I'm trying to highlight an address. I sucks at regex and I've tried to get close but could someone help me understand how I can make this better?

string:

6 am - 11 pM , Palma Sola Elementary, 6806 Fifth Ave NW, Bradenton, FL 34209 Come find just near the dsfsd sa fsa fasdf asfsds 5001 west your momma doesn't live here my 2005 ford ranger,

Regex 1:

/\s+(\d{2,5}\s+)(?![a|p]m\b)(([a-zA-Z|\s+]{1,5}){1,2})?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)([\s|\,|.|\;]+)?(([a-zA-Z|\s+]{1,30}){1,2})([\s|\,|.]+)?\b(AK|AL|AR|AZ|CA|CO|CT|DC|DE|FL|GA|GU|HI|IA|ID|IL|IN|KS|KY|LA|MA|MD|ME|MI|MN|MO|MS|MT|NC|ND|NE|NH|NJ|NM|NV|NY|OH|OK|OR|PA|RI|SC|SD|TN|TX|UT|VA|VI|VT|WA|WI|WV|WY)([\s|\,|.]+)?(\s+\d{5})?([\s|\,|.]+)/i

(Sometimes there's just a street and city, but no state or zip)

regex 2:

/\b(\d{2,5}\s+)(?![a|p]m\b)(NW|NE|SW|SE|north|south|west|east|n|e|s|w)?([\s|\,|.]+)?(([a-zA-Z|\s+]{1,30}){1,4})(court|ct|street|st|drive|dr|lane|ln|road|rd|blvd)/i

Fiddle with it: http://jsfiddle.net/isuelt/rMC6P/11/

Answer

Matt picture Matt · Feb 22, 2012

US addresses are not a regular language, and cannot be matched by using regular expressions. They are helpful in some isolated cases, but in general, they will fail you, especially for input like that.

I used to work at an address verification company. In answer to your question, to "highlight an address" in a string of text, I recommend you try an extraction utility. There are a few out there and I suggest you look around, but here is ours using the input from your question --- as you can see, it found the address and validated it:

LiveAddress extraction example

The API endpoint returns JSON which contains the start and end positions of each address, as well as plenty of information about each one. (See the CSV output at the bottom of the picture above.)

I commend you for braving those regular expressions you tried! Hopefully this is helpful.