Generating test data - how to generate a valid address for a given US zipcode?

Joseph picture Joseph · Apr 3, 2018 · Viewed 13.7k times · Source

I am creating a tool which depends on addresses. For the purposes of testing, I'd like to create a large number of valid US addresses. I have the GeoNames postal code data and I would like to generate some number of real addresses for each of the ~41,000 zip codes in the United States.

I've found sites like FakeAddressGenerator and FakeName which claim to generate random, valid US addresses. How do these sites work? How can I do the same thing without relying on scraping these websites?

Ideally, I'd like to be able to do this in Python; utilizing a web service is fine (it doesn't seem that either FakeAddressGenerator or FakeName provide such a web service).

Thanks!

Answer

Lynx-Lab picture Lynx-Lab · Apr 27, 2018

Googling your issue I found 2 links of interest:

  1. https://github.com/EthanRBrown/rrad that provides approximately 3200 real anonymised addresses.
  2. https://openaddresses.io that also has a link to their open source github with the complete data set.

I don't recommend scraping the fake address generators as they do not guarantee existence. I would not go sampling in google maps either as you will surely get blacklisted.

Extracting data from downloaded zip file in 2 is easy: they are zip files containing csv files with full address, zip, lat, lon, etc...

The two above data sets "guarantee" the existence of the address. I don't know how hard your other conditions are, namely having at least one valid address for each of the 41k zip codes. If this is a hard constraint, I doubt you will get such data set open source.


EDIT:

If you have a list of all postcodes in the US, a fully automatable solution is by using a service called nominatim of openstreetmap(subject to their TOCs!)

1) get the lat, lon (centre point or default address) of each post code:

https://nominatim.openstreetmap.org/search/?format=xml&addressdetails=1&limit=1&country_codes=us&postalcode=35051

2) get the related address of this lat, lon:

https://nominatim.openstreetmap.org/reverse?format=xml&lat=33.178764&lon=-86.619038&zoom=18&addressdetails=1

trying this example for Columbiana in Alabama (postcode 35051) yields 397 West College Street.

Nominatim documentation is at: https://wiki.openstreetmap.org/wiki/Nominatim