in pyspark how can i multiply a single column by the result of another column?

This should be the working solution for you, just use groupBy() and sum()

Create the DF Here

    df = spark.createDataFrame([("2017-03-10","Laptop", 2),("2017-03-12","Laptop", 2),("2017-03-10","Mobile", 1),("2017-03-10","Laptop", 2),("2017-03-11","TV",1),("2017-03-12","TV",1),("2017-03-13","TV",2)],[ "col1","col2", "qty"])
df.show(truncate=False)
df_grp =df.groupBy("col1", "col2").agg(F.sum("qty").alias("tot_qty"))
df_grp.show()

Input

    +----------+------+---+
|col1      |col2  |qty|
+----------+------+---+
|2017-03-10|Laptop|2  |
|2017-03-12|Laptop|2  |
|2017-03-10|Mobile|1  |
|2017-03-10|Laptop|2  |
|2017-03-11|TV    |1  |
|2017-03-12|TV    |1  |
|2017-03-13|TV    |2  |
+----------+------+---+

Output

+----------+------+-------+
|      col1|  col2|tot_qty|
+----------+------+-------+
|2017-03-12|Laptop|      2|
|2017-03-13|    TV|      2|
|2017-03-12|    TV|      1|
|2017-03-10|Mobile|      1|
|2017-03-10|Laptop|      4|
|2017-03-11|    TV|      1|
+----------+------+-------+

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top