I have a website which needs to obtain the Latitude and Longitude for the address entered by the customer.
Google/Bing/Yahoo are too expensive for us so we went with OpenStreetMap/Nominatim.
Unfortunately while it worked OK during testing, its failing to find about 50% of the addresses entered which is a big issue.
There are 3 things I am interested in knowing:
What is the best way to deal with the situation where the customer really does enter an incorrect address - send them an email and ask them to correct it? Use segments of the address until something is found?
What is the best way to handle the situation where the address is fine but I can't find it with OpenStreetMap? Or am I doing something wrong with my query to Nominatim?
Does anyone know of a free/cheap alternative if OpenStreetMap isn't up to the task? I know its an open source collaboration and therefore not complete, but I thought it did have pretty good coverage, and that it would return a nearby location if it didn't have the exact location - maybe it does and maybe I'm using it wrong.
Here is an example:
182 livington ave,albany,New York,12210,US
Google maps finds that easily. Nominatim finds nothing: http://nominatim.openstreetmap.org/search?format=xml&addressdetails=0&q=182%20livington%20ave,albany,New%20York,12210,US
I think what you're looking for is address verification. Google, Nominatim, and others, only perform address approximation which is good for finding addresses when you aren't sure what they are, but the results are only a best guess.
I helped develop an API which verifies and geocodes addresses according to stringent CASS™ requirements called LiveAddress. I ran your sample address through Google, Nominatim, and LiveAddress API and these are the results:
Google found the address despite the typo in "Livingston" but could not guarantee its validity, saying, "Address is approximate." -- then again, it says that for just about every address you try.
Nominatim does not find it because of the typo. Perhaps a drawback to using Nominatim is that it doesn't try to compensate for typos, verify the accuracy or completeness of addresses, etc. Fixing the typo returned some information but it was anyone's guess what had to be fixed, and why the query failed anyway.
LiveAddress doesn't recognize the address as entered because of the typo. Missing the "s" in "Livingston" is dramatic because there are streets named "Livington," leaving the query ambiguous, and the results were too much of a mis-match to return according to CASS™ specs. Changing the name with a different typo, "Livingstn," however, produced a valid result, which typo Nominatim did't accept either:
... for some reason I have to break out of my bullet points for code to render properly:
[
{
"input_index": 0,
"candidate_index": 0,
"delivery_line_1": "182 Livingston Ave",
"last_line": "Albany NY 12210-2512",
"delivery_point_barcode": "122102512824",
"components": {
"primary_number": "182",
"street_name": "Livingston",
"street_suffix": "Ave",
"city_name": "Albany",
"state_abbreviation": "NY",
"zipcode": "12210",
"plus4_code": "2512",
"delivery_point": "82",
"delivery_point_check_digit": "4"
},
"metadata": {
"record_type": "S",
"county_fips": "36001",
"county_name": "Albany",
"carrier_route": "C011",
"congressional_district": "21",
"rdi": "Residential",
"latitude": 42.66033,
"longitude": -73.75285,
"precision": "Zip9"
},
"analysis": {
"dpv_match_code": "Y",
"dpv_footnotes": "AABB",
"dpv_cmra": "N",
"dpv_vacant": "N",
"active": "Y",
"ews_match": false,
"footnotes": "M#"
}
}
]
The analysis footnote "M#" indicates a match was achieved by fixing the spelling of the street name. The resulting DPV footnotes "AABB" indicate that the entire address matched a street + city/state on the national ZIP+4 file. Also note that Zip9 precision which is the most precise level of geocoding (currently) — accurate to block (or closer) level.
So, in answer to your questions:
That depends. Are your customers entering an address on a website form? Tell them right away before they continue, that the address isn't valid. We're working on a jQuery plugin to make this cut-and-paste easy for everybody, but until then, you can see our concept in our checkout form which implements a pretty slick system: SmartyStreets has a jQuery Plugin which verifies addresses on website forms (just copy-and-paste). When an address is typed, it is automatically verified. If it is wrong, they slide up a notification asking the user if they'd like to fix it. Sometimes their address is ambiguous, where it returns a few valid results. (Try: "100, new york, ny") — They show a few suggestions and you can pick one. You fix it and the form does not submit until the user gets a valid address or says "Use mine anyway; I guarantee it's right." Or, if the address is correct, they put the standardized results in the address fields and display a green notice: "Address verified!"
I think I discussed this above. Your query is fine; it seems to be a shortcoming in Nominatim.
As suggested, you could try LiveAddress. Try it with a large set of your addresses to get a better idea (comparing from one address alone is, I'll admit, a weak indication) — but so far it seems like, for your needs, LiveAddress is somewhere between Google Maps and Nominatim.
I ran out of room in the comments.
here is another address causing us issues "7580 E Big Cannon Drive,Anaheim Hills,Anaheim Hills,California,92808,US" even "7580 E Big Cannon Drive,California,92808,US" didn't seem to work with your site.
I did some research on the USPS site and some other service providers as well. None returned any valid results or suggestions. But I found out what's the issue with the address as you submitted it:
Mispelled street name. No biggie; LiveAddress corrected this to Big Canyon.
Bad primary number. There's not much hope here if the primary number is incorrect. There's generally no way for a computer or human to infer what you really meant. In these cases, the address will fail verification and the user must supply something valid to go on. I found a valid primary number at 7584.
Master-planned community, not city/county. "Anaheim Hills" is the name of a master-planned community. Google found it in its business listings, but that has nothing to do with the address.
"Anaheim Hills" twice. It's confusing the parser. Unfortunately, with extra unnecessary information (esp. in a single-line address), it's nearly impossible to tell what part of it is dubious. That second "Anaheim Hills" has to go, but the first one can stay and it will be fine.
Country information. Most of the services I tried your address on got confused with the country in front and put it in the "Company/Firm Name" field. We deal with US addresses, so you can omit the country. It'll reduce the size of your request too.
LiveAddress was actually able to verify the address in these forms, both as a single-line address and split into components:
7584 E Big Cannon Drive anaheim hills ca 92808
7584 bg cannon 92808
7584 big cannon ave aneheim hills ca
The most significant help was finding a valid primary number. In the case that no valid addresses come back, you should alert the user and suggest fixing the primary number and making sure the city/state (if given) align with the zip code ('cause if those two are fighting, it's also impossible to tell what you meant).