Pandas: Fastest way to group by max and summing over the group

In general I believe your approach works, except for a few improvements:

# no need to set_index. Do so on smaller/filtered data if needed
# df = df.set_index('A') 

# this is good 
df['sum'] = df.groupby('A')['D'].transform('sum')

# there's a bit difference between `'max'` and `max`.
# one is vectorized, one is not
idx = df.groupby(['A'])['C'].transform('max') == df['C']

df= df[idx] 

Another improvement is that you can do lazy groupby:

groups = df.groupby('A')

df['sum'] = groups['D'].transform('sum')

idx = groups['C'].transform('max') == df['C']

df = df[idx]

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top