There seem to be several things wrong with the code or perhaps you have not provided the complete code.
- Have you defined fullpath?
- You have set header=False then how will spark know that there is an “id” column?
- Your indentation looks wrong under the for loop.
- full_data has not been defined yet, so how are you using it on the right side of the evaluation within the for loop? I suspect you have initialized this to the first csv file and then attempting to join it with first csv again.
I ran a small test on the below code which worked for me and addresses the questions I’ve raised above. You can adjust it to your need.
fullpath = '/content/sample_data/' full_data = spark.read.csv(fullpath+'Book1.csv' ,header=True, inferSchema= True) name_file =['Book2', 'Book3'] for n in name_file: n= spark.read.csv(fullpath+n+'.csv' ,header=True, inferSchema= True) full_data=full_data.join(n,["id"]) full_data.show(5)
CLICK HERE to find out more related problems solutions.