Duplicates using python, if any create a new column when there’s a match

You can mark duplicates with ‘Yes’ and ‘No’

df['Matches'] = df.duplicated('Audit ID', keep=False).map({True: 'Yes',False: 'No'})
df

Out:

  Offender Name  Issue Date      Audit ID Matches
0           Joe  12/02/2020  Joe-12/02/20     Yes
1           Nic  20/02/2020  Nic-20/02/20      No
2           Mat  01/02/2020  Mat-01/02/20      No
3           Joe  12/02/2020  Joe-12/02/20     Yes

The column Audit ID is redundant. You have the same informations in your dataframe already

df['Matches'] = df.duplicated(['Offender Name','Issue Date'], keep=False).map({True: 'Yes',False: 'No'})
df

Out:

  Offender Name  Issue Date      Audit ID Matches
0           Joe  12/02/2020  Joe-12/02/20     Yes
1           Nic  20/02/2020  Nic-20/02/20      No
2           Mat  01/02/2020  Mat-01/02/20      No
3           Joe  12/02/2020  Joe-12/02/20     Yes

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top