You can do like below,
from pyspark import Row
from pyspark.sql import functions as F
row = Row('id', 'Name', 'age', 'gender')
row_df = spark.createDataFrame(
[row(1, 'Test', '12', 'Male'), row(2, 'Test2', '15', 'Female')])
row_df.show()
if 'gender' not in row_df.columns:
row_df = row_df.withColumn('gender', F.lit(None))
if 'city' not in row_df.columns:
row_df = row_df.withColumn('city', F.lit(None))
if 'contact' not in row_df.columns:
row_df = row_df.withColumn('contact', F.lit(None))
row_df.show()
Output:
+---+-----+---+------+
| id| Name|age|gender|
+---+-----+---+------+
| 1| Test| 12| Male|
| 2|Test2| 15|Female|
+---+-----+---+------+
+---+-----+---+------+----+-------+
| id| Name|age|gender|city|contact|
+---+-----+---+------+----+-------+
| 1| Test| 12| Male|null| null|
| 2|Test2| 15|Female|null| null|
+---+-----+---+------+----+-------+
CLICK HERE to find out more related problems solutions.