Scala Spark Convert Dataframe and get all Unique IDs and its type from each row

Add required columns inside array(struct(<column list>)) & then explode column data.

Check below code.

Given Data

scala> df.show(false)
+---------+---------+-------------+-------+
|device_id|master_id|time         |user_id|
+---------+---------+-------------+-------+
|X        |M        |1604609299000|A      |
|Z        |M        |1604609318000|A      |
|Y        |N        |1604610161000|B      |
+---------+---------+-------------+-------+

Creating expressions

scala> val colExpr = array(
        df
        .columns
        .filterNot(_ == "time")
        .map(c => 
                struct(
                    col(c).as("id"), // id column
                    col("time").as("time"), // time column
                    lit(c).as("type") // type column
                )
            ):_*
    )

Result of above code is below expressions.

Note : Below expression just for understanding how above code is converting into expressions. (Don’t execute below expressions.)

colExpr: org.apache.spark.sql.Column = array(
    named_struct(NamePlaceholder(), device_id AS `id`, NamePlaceholder(), time AS `time`, type, device_id AS `type`), 
    named_struct(NamePlaceholder(), master_id AS `id`, NamePlaceholder(), time AS `time`, type, master_id AS `type`), 
    named_struct(NamePlaceholder(), user_id AS `id`, NamePlaceholder(), time AS `time`, type, user_id AS `type`)
)

Applying Expression

scala> df.select(explode(colExpr).as("data")).select("data.*").show(false)
+---+-------------+---------+
|id |time         |type     |
+---+-------------+---------+
|X  |1604609299000|device_id|
|M  |1604609299000|master_id|
|A  |1604609299000|user_id  |
|Z  |1604609318000|device_id|
|M  |1604609318000|master_id|
|A  |1604609318000|user_id  |
|Y  |1604610161000|device_id|
|N  |1604610161000|master_id|
|B  |1604610161000|user_id  |
+---+-------------+---------+

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top