Where can I find historical raw weather data for a project I am doing with focus on the USA and Canada. I need temperatures mainly, but other details would be nice. I am having a very hard time finding this data. I really dont want to have to scrape a weather site.
I found myself asking this same question, and will share my experience for future Googlers.
I wanted raw data, and lots of it... an API wouldn't do. I needed to head directly to the source. The best source for all of that data seemed to be either the NCEP or NCDC NOMADS servers:
http://nomads.ncdc.noaa.gov/dods/ <- good for historical data
http://nomads.ncep.noaa.gov/dods/ <- good for recent data
(Note: A commenter indicated that you must now use https rather than http. I haven't tested it yet, but if you're having issues, try that!)
To give an idea of the amount of data, their data goes all the way back to 1979! If you're looking for Canada and the US, the North American Regional Reanalysis dataset is probably your best answer.
I'm a big python user, and either pydap or NetCDF seemed like good tools to use. For no particular reason, I started playing around with pydap.
To give an example of how to get all of the temperature data for a particular location from the nomads website, try the following in python:
from pydap.client import open_url
# setup the connection
url = 'http://nomads.ncdc.noaa.gov/dods/NCEP_NARR_DAILY/197901/197901/narr-a_221_197901dd_hh00_000'
modelconn = open_url(url)
tmp2m = modelconn['tmp2m']
# grab the data
lat_index = 200 # you could tie this to tmp2m.lat[:]
lon_index = 200 # you could tie this to tmp2m.lon[:]
print tmp2m.array[:,lat_index,lon_index]
The above snippet will get you a time series (every three hours) of data for the entire month of January, 1979! If you needed multiple locations or all of the months, the above code would easily be modified to accommodate.
I wasn't happy stopping there. I wanted this data in a SQL database so that I could easily slice and dice it. A great option for doing all of this is the python forecasting module.
Disclosure: I put together the code behind the module. The code is all open source -- you can modify it to better meet your needs (maybe you're forecasting for Mars?) or pull out little snippets for your project.
My goal was to be able to grab the latest forecast from the Rapid Refresh model (your best bet if you want accurate info on current weather):
from forecasting import Model
rap = Model('rap')
rap.connect(database='weather', user='chef')
fields = ['tmp2m']
rap.transfer(fields)
and then to plot the data on a map of the good 'ole USA:
The data for the plot came directly from SQL and could easily modify the query to get out any type of data desired.
If the above example isn't enough, check out the documentation, where you can find more examples.