how can you fill a column with values that are computed between two dates in pandas with a delay of one row if you have repeating dates?

You could use rolling with the Date index a datetime:

df['Date'] = pd.to_datetime(df['Date'])
df['Win%'] = (
 df.set_index('Date')
   .rolling('1000d')  # last 1000 days
   ['Position']
   .apply(lambda s: round(s.eq(1).sum()/len(s)*100))
   .shift()
   .values
)

output:

        Date  Position  TrainerID  Win%
0 2017-09-03         4       1788   NaN
1 2017-09-16         5       1788   0.0
2 2017-10-14         1       1788   0.0
3 2017-10-14         3       1788  33.0

NB. it is not clear if there are several “TrainerID”, but in this case you could also perform everything grouped by “TrainerID”

applying per group
df['Win%'] = (
 df.set_index('Date')
   .groupby('TrainerID')
   .rolling('1000d')['Position']
   .apply(lambda s: round(s.eq(1).sum()/len(s)*100))
   .groupby('TrainerID').shift()
   .values
)

output:

        Date  Position  TrainerID  Win%
0 2017-09-03         4       1788   NaN
1 2017-09-16         5       1788   0.0
2 2017-10-14         1       1788   0.0
3 2017-10-14         3       1788  33.0
4 2017-09-03         4       1789   NaN
5 2017-09-16         5       1789   0.0
6 2017-10-14         1       1789   0.0
7 2017-10-14         3       1789  33.0

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top