I have a numpy array roughly like so:
data
array([(datetime.datetime(2009, 1, 6, 2, 30), 17924.0, 0.0),....
(datetime.datetime(2009, 1, 29, 16, 30), 35249.2, 521.25],
dtype=[('timestamp', '|O4'), ('x1', '<f8'), ('x2', '<f8')])
I would like to be able to index the data based on the first column (i.e. with the datetime objects), so I can access a particular year / month / day worth of data, with something like this:
data[data['timestamp'].year == 2009]
This obviously doesn't work. The only thing I can think of doing is adding additional columns (e.g. a "year" column), so this would work:
data[data['year'] == 2009]
Seems like a fairly inefficient way of doing things (and would duplicate a lot of data) - particularly if I want to index on all the other time intervals as well... is there a better way to do this?
Thanks in advance.
Use pandas. "pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language."
There are tons of examples in the documentation but you can do what you are looking to do like this:
import pandas
import numpy as np
import datetime as dt
# example values
dates = np.asarray(pandas.date_range('1/1/2000', periods=8))
# create a dataframe
df = pandas.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
# date you want
date=dt.datetime(2000,1,2)
# magic :)
print df.xs(date)
I suggest learning this module ASAP. It is absolutely exceptional. This is a very simple example. Check out the documentation which is very thorough.