Find percent diff and diff with consecutive but odd number of dates

Here is one way around it, assuming I get your logic right :

The idea is to use shift for each group to calculate the difference and percentage,

result = (df.sort_values(["id", "date", "value"])
                  # use this later to drop the first row per group
                  # if number is greater than 1, else leave as-is
          .assign(counter=lambda x: x.groupby("id").date.transform("size"),
                  date_shift=lambda x: x.groupby(["id"]).date.shift(1),
                  value_shift=lambda x: x.groupby("id").value.shift(1),
                  diff=lambda x: x.value - x.value_shift,
                  percent=lambda x: x["diff"].div(x.value_shift).mul(100).round(2))
           # here is where the counter column becomes useful
           # drop rows where date_shift is null and counter is > 1
           # this way if number of rows in the group is just one it is kept, 
           # if greater than one, the first row is dropped, 
           # as the first row would have nulls due to the `shift` method.
          .query("not (date_shift.isna() and counter>1)")
          .loc[:, ["id", "date", "diff", "percent"]]


   id   date        diff    percent
2   1   10/01/2020   5.0     33.33
0   1   11/01/2020  -10.0   -50.00
3   2   10/01/2020   20.0    200.00
1   2   11/01/2020  -25.0   -83.33
6   3   11/01/2020   0.0     0.00

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top