how do you extract data for multiple years in a ncc file?

I typically find the best approach in these cases (where a simple range will not suffice) is to see if I can construct a boolean array with the same length as the time coordinate that is True if the value is a date I’d like to include in the selection, and False if it is not. Then I can pass this boolean array as an indexer in sel to get the selection I’d like.

For this example I would make use of the dayofyear, year, and is_leap_year attributes of the datetime accessor in xarray:

import pandas as pd

# Note dayofyear represents days since January first of the year,
# so it is offset by one after February 28/29th in leap years
# versus non-leap years.
may_30_leap = pd.Timestamp("2000-05-30").dayofyear
august_18_leap = pd.Timestamp("2000-08-18").dayofyear
range_leap = range(may_30_leap, august_18_leap + 1)

may_30_noleap = pd.Timestamp("2001-05-30").dayofyear
august_18_noleap = pd.Timestamp("2001-08-18").dayofyear
range_noleap = range(may_30_noleap, august_18_noleap + 1)

year_range = range(2001, 2019)

indexer = ((ds.days.dt.dayofyear.isin(range_leap) & ds.days.dt.is_leap_year) |
           (ds.days.dt.dayofyear.isin(range_noleap) & ~ds.days.dt.is_leap_year))
indexer = indexer & ds.days.dt.year.isin(year_range)

result = ds.sel(time=indexer)

The leap year logic is a bit clunky, but I can’t think of a cleaner way.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top