Select rows of pandas dataframe based on column values with duplicates

create another dummy dataframe with the ids you wish to have:

df2 = pd.DataFrame({'customer_id':[1,2,2]})

    customer_id
0   1
1   2
2   2

and merge it with the give dataframe:

df.merge(df2)

desired result:

 customer_id    some_data
0   1            A
1   1            D
2   2            B
3   2            B

Most importantly: your code will work but its very slow for large data. The reason for your long processing time is your for loop! to optimize it you should always aim at vectorizing.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top