how do you replace a group with a value from another column in pyspark?

You can do this using expr.
I’m using ([0-9]{4}) as the regex pattern for detecting a year in filename.

from pyspark.sql.functions import expr

df.withColumn("reqd_filename",expr("regexp_replace(filename, \
        '([0-9]{4})', year)")).show()

+--------------------------+----+--------------------------+                    
|filename                  |year|reqd_filename             |
+--------------------------+----+--------------------------+
|blah_2020_v1_blah_blah.csv|1975|blah_1975_v1_blah_blah.csv|
|blah_2019_v1_blah_blah.csv|1984|blah_1984_v1_blah_blah.csv|
+--------------------------+----+--------------------------+

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top