subset an xarray dataset or data array in the time dimension

aswann picture aswann · Aug 23, 2018 · Viewed 7.2k times · Source

I'm trying to calculate a monthly climatology for a subset of the time dimension in an xarray dataset. Time is defined using datetime64.

This works fine if I want to use the whole timeseries:

monthly_avr=ds_clm.groupby('time.month').mean(dim='time')

But I really only want years bigger than 2001. Neither of these work:

monthly_avr2=ds_clm.where(ds_clm.time>'2001-01-01').groupby('time.month').mean('time')
monthly_avr3=ds_clm.isel(time=slice('2001-01-01', '2018-01-01')).groupby('time.month').mean('time')

Here is what my dataset looks like:

<xarray.Dataset>
Dimensions:       (hist_interval: 2, lat: 192, lon: 288, time: 1980)
Coordinates:
  * lon           (lon) float32 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 ...
  * lat           (lat) float32 -90.0 -89.057594 -88.11518 -87.172775 ...
  * time          (time) datetime64[ns] 1850-01-31 1850-02-28 1850-03-31 ...
Dimensions without coordinates: hist_interval
Data variables:
EFLX_LH_TOT   (time, lat, lon) float32 0.26219246 0.26219246 0.26219246 ...

Does anyone know the correct syntax for subsetting in time using datetime64?

Answer

jhamman picture jhamman · Aug 23, 2018

Indexing and selecting data in xarray by coordinate value is typically done using the sel() method. In your case, something like the following example should work.

monthly_avr3 = ds_clm.sel(
    time=slice('2001-01-01', '2018-01-01')).groupby('time.month').mean('time')

Using the where() method can also be useful sometime but for your use case, you would also need to add the drop=True option:

monthly_avr2 = ds_clm.where(
    ds_clm['time.year'] > 2000, drop=True).groupby('time.month').mean('time')