There seem to be several things wrong with the code or perhaps you have not provided the complete code.
- Have you defined fullpath?
- You have set header=False then how will spark know that there is an “id” column?
- Your indentation looks wrong under the for loop.
- full_data has not been defined yet, so how are you using it on the right side of the evaluation within the for loop? I suspect you have initialized this to the first csv file and then attempting to join it with first csv again.
I ran a small test on the below code which worked for me and addresses the questions I’ve raised above. You can adjust it to your need.
fullpath = '/content/sample_data/'
full_data = spark.read.csv(fullpath+'Book1.csv'
,header=True,
inferSchema= True)
name_file =['Book2', 'Book3']
for n in name_file:
n= spark.read.csv(fullpath+n+'.csv'
,header=True,
inferSchema= True)
full_data=full_data.join(n,["id"])
full_data.show(5)
CLICK HERE to find out more related problems solutions.