I am creating a tool which depends on addresses. For the purposes of testing, I'd like to create a large number of valid US addresses. I have the GeoNames postal code data and I would like to generate some number of real addresses for each of the ~41,000 zip codes in the United States.
I've found sites like FakeAddressGenerator and FakeName which claim to generate random, valid US addresses. How do these sites work? How can I do the same thing without relying on scraping these websites?
Ideally, I'd like to be able to do this in Python; utilizing a web service is fine (it doesn't seem that either FakeAddressGenerator or FakeName provide such a web service).
Thanks!
Googling your issue I found 2 links of interest:
I don't recommend scraping the fake address generators as they do not guarantee existence. I would not go sampling in google maps either as you will surely get blacklisted.
Extracting data from downloaded zip file in 2 is easy: they are zip files containing csv files with full address, zip, lat, lon, etc...
The two above data sets "guarantee" the existence of the address. I don't know how hard your other conditions are, namely having at least one valid address for each of the 41k zip codes. If this is a hard constraint, I doubt you will get such data set open source.
EDIT:
If you have a list of all postcodes in the US, a fully automatable solution is by using a service called nominatim of openstreetmap(subject to their TOCs!)
1) get the lat, lon (centre point or default address) of each post code:
2) get the related address of this lat, lon:
trying this example for Columbiana in Alabama (postcode 35051) yields 397 West College Street.
Nominatim documentation is at: https://wiki.openstreetmap.org/wiki/Nominatim