compare the struct field to another column spark

You have to use explode in order to generate a line for each element in data.

import org.apache.spark.sql.functions.explode

ds
  .withColumn("data", explode($"data"))
  .filter($"data.stat" === $"max_stat")
  .drop($"max_stat")
  .show()

Output:

+---------+-------+
|     data| naming|
+---------+-------+
|[3, 0.89]|example|
+---------+-------+

However, explode is a very costly operation and can be an issue if your dataset is big. Another way to do this without using explode is :

import org.apache.spark.sql.functions._

ds
  .filter(array_contains($"data.stat", $"max_stat"))
  .withColumn("max_stat_idx", array_position($"data.stat", $"max_stat").cast(IntegerType))
  .withColumn("data", element_at($"data", $"max_stat_idx"))
  .drop("max_stat", "max_stat_idx")

Basically, it’s searching for the matching value index in the data array, and then using this index to get the correct element

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top