Update table in Apache Spark / Databricks using multiple columns

First off, make sure you are using Delta Lake as the table format. Second, I think you are looking for Upserts, which are defined as

An operation that inserts rows into a database table if they do not already exist, or updates them if they do.

To do so you’ll need to use MERGE combined with UPDATE. Here’s an example with the matching expression:

MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
  UPDATE SET events.data = updates.data
WHEN NOT MATCHED
  THEN INSERT (date, eventId, data) VALUES (date, eventId, data)

See more in the Databricks documentation here.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top