Scala Spark Convert Dataframe and get all Unique IDs and its type from each row

Add required columns inside array(struct(<column list>)) & then explode column data.

Check below code.

Given Data

|device_id|master_id|time         |user_id|
|X        |M        |1604609299000|A      |
|Z        |M        |1604609318000|A      |
|Y        |N        |1604610161000|B      |

Creating expressions

scala> val colExpr = array(
        .filterNot(_ == "time")
        .map(c => 
                    col(c).as("id"), // id column
                    col("time").as("time"), // time column
                    lit(c).as("type") // type column

Result of above code is below expressions.

Note : Below expression just for understanding how above code is converting into expressions. (Don’t execute below expressions.)

colExpr: org.apache.spark.sql.Column = array(
    named_struct(NamePlaceholder(), device_id AS `id`, NamePlaceholder(), time AS `time`, type, device_id AS `type`), 
    named_struct(NamePlaceholder(), master_id AS `id`, NamePlaceholder(), time AS `time`, type, master_id AS `type`), 
    named_struct(NamePlaceholder(), user_id AS `id`, NamePlaceholder(), time AS `time`, type, user_id AS `type`)

Applying Expression

|id |time         |type     |
|X  |1604609299000|device_id|
|M  |1604609299000|master_id|
|A  |1604609299000|user_id  |
|Z  |1604609318000|device_id|
|M  |1604609318000|master_id|
|A  |1604609318000|user_id  |
|Y  |1604610161000|device_id|
|N  |1604610161000|master_id|
|B  |1604610161000|user_id  |

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top