An Upsert is an RDBMS feature that allows a DML statement’s author to automatically either insert a row or if the row already exists, UPDATE that existing row instead.From my experience building multiple Azure Data Platforms I have been able to develop reusable ELT functions that I can use from project to project, one being an Azure SQL upsert function.Today I’m going to share with you have to how to create an Azure SQL Upsert function using PySpark. It can be reused across Databricks workflows with minimal effort and flexibility.
Basic Upsert Logic
Azure SQL Upsert PySpark FunctionFunctionality
Before writing code, it is critical to understand the Spark Azure SQL Database connector. The connector does not support preUpdate or postUpdate statements following writing to a table. For this reason, we need to write the Dataframe to the staging table and subsequently pass the valid SQL merge statements to the PyODBC connector to execute the upsert.
#azure-sql-database #databricks #pyspark #spark #upsert #azure sql