Add required columns inside array(struct(<column list>))
& then explode column data.
Check below code.
Given Data
scala> df.show(false)
+---------+---------+-------------+-------+
|device_id|master_id|time |user_id|
+---------+---------+-------------+-------+
|X |M |1604609299000|A |
|Z |M |1604609318000|A |
|Y |N |1604610161000|B |
+---------+---------+-------------+-------+
Creating expressions
scala> val colExpr = array(
df
.columns
.filterNot(_ == "time")
.map(c =>
struct(
col(c).as("id"), // id column
col("time").as("time"), // time column
lit(c).as("type") // type column
)
):_*
)
Result of above code is below expressions.
Note : Below expression just for understanding how above code is converting into expressions. (Don’t execute below expressions.)
colExpr: org.apache.spark.sql.Column = array(
named_struct(NamePlaceholder(), device_id AS `id`, NamePlaceholder(), time AS `time`, type, device_id AS `type`),
named_struct(NamePlaceholder(), master_id AS `id`, NamePlaceholder(), time AS `time`, type, master_id AS `type`),
named_struct(NamePlaceholder(), user_id AS `id`, NamePlaceholder(), time AS `time`, type, user_id AS `type`)
)
Applying Expression
scala> df.select(explode(colExpr).as("data")).select("data.*").show(false)
+---+-------------+---------+
|id |time |type |
+---+-------------+---------+
|X |1604609299000|device_id|
|M |1604609299000|master_id|
|A |1604609299000|user_id |
|Z |1604609318000|device_id|
|M |1604609318000|master_id|
|A |1604609318000|user_id |
|Y |1604610161000|device_id|
|N |1604610161000|master_id|
|B |1604610161000|user_id |
+---+-------------+---------+
CLICK HERE to find out more related problems solutions.