Grouped time difference when a condition is met

Use GroupBy.cumcount with ascending=False by column id and helper Series with Series.cumsum but form back – so added indexing by Series.iloc:

g = f['has_err'].iloc[::-1].cumsum().iloc[::-1]
df['days_to_err'] = df.groupby(['id', g])['has_err'].cumcount(ascending=False)
print(df)
     id        date  info_a_cnt  info_b_cnt  has_err  days_to_err
0   123  2020-01-01         123          32        0            2
1   123  2020-01-02           2          43        0            1
2   123  2020-01-03          43           4        1            0
3   123  2020-01-04          43           4        0            3
4   123  2020-01-05          43           4        0            2
5   123  2020-01-06          43           4        0            1
6   123  2020-01-07          43           4        1            0
7   123  2020-01-08          43           4        0            0
8   232  2020-01-04          56           4        0            3
9   232  2020-01-05          97           1        0            2
10  232  2020-01-06          23          74        0            1
11  232  2020-01-07          91          85        1            0
12  232  2020-01-08          91          85        0            2
13  232  2020-01-09          91          85        0            1
14  232  2020-01-10          91          85        1            0

EDIT: For count cumulative sum of differencies of dates use custom lambda function with GroupBy.transform:

df['days_to_err'] = (df.groupby(['id', df['has_err'].iloc[::-1].cumsum()])['date']
                       .transform(lambda x: x.diff().dt.days.cumsum())
                       .fillna(0)
                       .to_numpy()[::-1])
print(df)
     id       date  info_a_cnt  info_b_cnt  has_err  days_to_err
0   123 2020-01-01         123          32        0          2.0
1   123 2020-01-02           2          43        0          1.0
2   123 2020-01-03          43           4        1          0.0
3   123 2020-01-04          43           4        0          3.0
4   123 2020-01-05          43           4        0          2.0
5   123 2020-01-06          43           4        0          1.0
6   123 2020-01-07          43           4        1          0.0
7   123 2020-01-08          43           4        0          0.0
8   232 2020-01-04          56           4        0          3.0
9   232 2020-01-05          97           1        0          2.0
10  232 2020-01-06          23          74        0          1.0
11  232 2020-01-07          91          85        1          0.0
12  232 2020-01-08          91          85        0          2.0
13  232 2020-01-09          91          85        0          1.0
14  232 2020-01-10          91          85        1          0.0

EDIT1: Use Series.dt.total_seconds with divide by 60:

#some data sample cleaning
df = pd.read_csv("https://pastebin.com/raw/YZukAhBz", parse_dates=['date'])
df['date'] = df['date'].apply(lambda x: x.replace(month=1, day=1))
print(df)

df['days_to_err'] = (df.groupby(['id', df['has_err'].iloc[::-1].cumsum()])['date']
                       .transform(lambda x: x.diff().dt.total_seconds().div(60).cumsum())
                       .fillna(0)
                       .to_numpy()[::-1])
print(df)


     id                date  info_a_cnt  info_b_cnt  has_err  days_to_err
0   123 2020-01-01 12:00:00         123          32        0         30.0
1   123 2020-01-01 12:15:00           2          43        0         15.0
2   123 2020-01-01 12:30:00          43           4        1          0.0
3   123 2020-01-01 12:45:00          43           4        0         45.0
4   123 2020-01-01 13:00:00          43           4        0         30.0
5   123 2020-01-01 13:15:00          43           4        0         15.0
6   123 2020-01-01 13:30:00          43           4        1          0.0
7   123 2020-01-01 13:45:00          43           4        0          0.0
8   232 2020-01-01 17:00:00          56           4        0         45.0
9   232 2020-01-01 17:15:00          97           1        0         30.0
10  232 2020-01-01 17:30:00          23          74        0         15.0
11  232 2020-01-01 17:45:00          91          85        1          0.0
12  232 2020-01-01 18:00:00          91          85        0         30.0
13  232 2020-01-01 18:15:00          91          85        0         15.0
14  232 2020-01-01 18:30:00          91          85        1          0.0

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top