how do you specify pandasreplace instead of strreplace when defining a function?

From this line:

txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)

This is the signature for pd.Series.replace so your function takes a series as input. On the other hand,

df['old_text'].apply(removeWords)

applies the function to each cell of df['old_text']. That means, txt would be just a string, and the signature for str.replace does not have keyword arguments (regex=True) in this case.

TLDR, you want to do:

df['new_text'] = removeWords(df['old_text'])

Output:

   id                      old_text                new_text
0   0     my favorite color is blue    favorte color s blue
1   1                you have a dog              have a dog
2   2  we built the house ourselves   bult the house selves
3   3              i will visit you                wll vst 

But as you can see, your function replaces the i within the words. You may want to modify the pattern so as it only replaces the whole words with the boundary indicator \b:

def removeWords(txt):
    words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
    
    # note the `\b` here
    return txt.replace(rf"\b({'|'.join(words)})\b", '', regex=True)

Output:

   id                      old_text                 new_text
0   0     my favorite color is blue   favorite color is blue
1   1                you have a dog               have a dog
2   2  we built the house ourselves         built the house 
3   3              i will visit you              will visit 

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top