This article is a follow-up to Data Platform as a Service and Data Platform: The New Generation Data Lakes. In this case, I will describe how to design and build an automated Data Ingestion Engine based on Spark and Databricks features.
The most important principle to design a Data Ingestion Engine is to follow an automation paradigm. Automation provides a set of key advantages to be successful, some of them are in the following diagram:
There are several questions that we have to ask ourselves before starting:
#cloud #big data #azure #data #data lake architecture #databricks