compare the struct field to another column spark

You have to use explode in order to generate a line for each element in data.

import org.apache.spark.sql.functions.explode

  .withColumn("data", explode($"data"))
  .filter($"data.stat" === $"max_stat")


|     data| naming|
|[3, 0.89]|example|

However, explode is a very costly operation and can be an issue if your dataset is big. Another way to do this without using explode is :

import org.apache.spark.sql.functions._

  .filter(array_contains($"data.stat", $"max_stat"))
  .withColumn("max_stat_idx", array_position($"data.stat", $"max_stat").cast(IntegerType))
  .withColumn("data", element_at($"data", $"max_stat_idx"))
  .drop("max_stat", "max_stat_idx")

Basically, it’s searching for the matching value index in the data array, and then using this index to get the correct element

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top