Visual return value for pd.date_range incorrect (Jupyter Notebook)

Indeed, that is the intended behavior. The “visual return” is as a date, since you have selected a frequency higher or equal to a day when constructing the pd.date_range, and no special timezone, however the underlying value is a Timestamp.

In the first case, when you directly slice by a string representation of the datetime (e.g. "2020-01-31"), Pandas uses what it calls “Partial string indexing”, which is essentially a convenience inbuilt into Pandas, that allows you to filter by only a specific year, month or date, even if the index is a full-fledged DatetimeIndex (which in your case, it is). As you’ll see in the linked documentation, if you slice in the following way, you include all the timestamps (rows / index values) of the last date:

dft['2013-1':'2013-2-28'] # example from documentation

“This specifies a stop time that includes all of the times on the last day.” ->> all timestamps within January 2013 and up to the very last timestamp of Feb 28, 2013 will be included in the selection.

On the other hand, in the second case (display(test.loc[:split_list[0]])), you are actually specifying a very concrete timestamp for the end of your slice. This timestamp is '2020-01-31 00:00:00', which is the midnight (earliest time) of the 31st of January. This means that the remaining timestamps on that date will be excluded.

Another way to demonstrate is by using the index itself to slice:


2020-01-01 00:00:00 0
2020-01-01 06:00:00 1
2020-01-01 12:00:00 2
2020-01-01 18:00:00 3


# output

2020-01-01  0

What happens here is that in test.loc[:'2020-01-01'], '2020-01-01' is interpreted as a string representation of a date, not a datetime. Due to the Pandas convention mentioned above, it will filter the index for all datetimes up to and including the date ‘2020-01-01’. Therefore you don’t lose any of the timestamps in that date.

On the other hand, test.loc[:test.index[0]] compares to test.index[0], which is exactly equal to the timestamp Timestamp('2020-01-01 00:00:00', freq='6H'). This means that slicing will say “find me all the datetimes in the index up to and including the timestamp ‘2020-01-01 00:00:00’. Which means that only this one timestamp will be selected for the 1st of January. Every other timestamp on that date happens after midnight (notice that midnight is the earliest time of the day, not the latest).

Finally, about representation: as mentioned in the comment, split_list is actually in your case a DatetimeIndex, so, although when you print it you only see string representations of dates, it consists of datetimes. You can see this by printing the first one, for example:

Timestamp('2020-01-31 00:00:00', freq='M')

Since this is a date range with frequency higher to or equal than a day, it will print as a date, not a timestamp.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top