Off late ACID compliance on Hadoop like system-based Data Lake has gained a lot of traction and Databricks Delta Lake and Uber’s Hudi have been the major contributors and competitors. As both solve a major problem by providing the different flavors of abstraction on “parquet” file format; it’s very hard to pick one as a better choice over the other. In this blog, we are going to understand using a very basic example of how these tools work under the hood. We will leave for the readers to take the functionalities as pros/cons.
We would follow a reverse approach as in the next article in this series, we will discuss the importance of a Hadoop like Data Lake and why the need for systems like Delta/Hudi arose in the first place and how Data Engineers used to do build siloed and error-prone ACID systems for Lakes.
#data-lake #punchh #delta-lake #data-engineering #hudi