pyspark max string length for each column in the dataframe

>>> from pyspark.sql import functions as sf
>>> df = sc.parallelize([['a','bbbbb','ccc','ddd'],['aaaa','bbb','ccccccc', 'dddd']]).toDF(["column1", "column2", "column3", "column4"])
>>> df1 = df.select([sf.length(col).alias(col) for col in df.columns])
>>> df1.groupby().max().show()
+------------+------------+------------+------------+
|max(column1)|max(column2)|max(column3)|max(column4)|
+------------+------------+------------+------------+
|           4|           5|           7|           4|
+------------+------------+------------+------------+

then use this link to melt previous dataframe

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top