how can you scale the accumulated data by seconds and reset it using a pandas dataframe?

Just compute the difference directly without groupby after resampling. The beginning of days can be detected by a diff-ne construct, and the required correction at these locations can be made by adding the previous cumulative value back.

Data

The index here is pd.Timestamp.

print(df)
                     total
index                     
2020-11-05 23:59:48    100
2020-11-05 23:59:59    150
2020-11-06 00:00:01     10
2020-11-06 00:00:02     20
2020-11-06 00:00:12     40

Code

# Not accurate for the test data, so replaced
# df2 = df.resample("10S").last()

df2 = df.copy()
df2["new_index"] = df2.index.map(lambda ts: ts + pd.Timedelta(9 - ts.second % 10, unit="s"))
df2 = df2.groupby("new_index").last()

# 1. de-accumulate without groupby
df2["diff"] = df2.diff()
# 2. get date change locations (where diff != 0 Days)
df2["date"] = df2.index.date
df2["add"] = df2["date"].diff().ne("0D")
# 3. add the previous total back
df2.loc[df2["add"], "diff"] += df2["total"].shift()[df2["add"]]

Result

The diff column is what you want.

print(df2)
                     total  diff        date    add
new_index                                          
2020-11-05 23:59:49    100   NaN  2020-11-05   True
2020-11-05 23:59:59    150  50.0  2020-11-05  False
2020-11-06 00:00:09     20  20.0  2020-11-06   True
2020-11-06 00:00:19     40  20.0  2020-11-06  False

Tested on python 3.7.9 and pandas 1.1.3

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top