In this article, learn how to design and build an automated Data Ingestion Engine based on Spark and Databricks features.

Introduction

This article is a follow-up to Data Platform as a Service and Data Platform: The New Generation Data Lakes. In this case, I will describe how to design and build an automated Data Ingestion Engine based on Spark and Databricks features.

The most important principle to design a Data Ingestion Engine is to follow an automation paradigm. Automation provides a set of key advantages to be successful, some of them are in the following diagram:

Automation diagram.

There are several questions that we have to ask ourselves before starting:

  • What is the goal?
  • What does automation mean in our case?
  • How much time and/or effort can we save with this automation?
  • What is the value of this automation to our users and/or product?
  • Are there open-source/commercial tools available to get the goal? Do we have to develop a new one?

#cloud #big data #azure #data #data lake architecture #databricks

Data Platform: Data Ingestion Engine for Data Lake
1.55 GEEK