When it comes to data engineering, there are two common approaches for transforming and loading data. The first approach is ETL(Extract Transform Load). The other is referred to as ELT (Extract, Load, Transform). One easily notices that the transform and load steps in the acronym, TL vs LT, are swapped. However, this small swap in wording has much larger implications in data processing. To explain this, let’s start with a quick explanation of ETL.

ETL stands for Extract, Transform, and Load. ETL is a high-level data processing and loading architecture. The first step of ETL is Extract, which refers to the process of extracting data from a source. This can be through hitting an API to receive the data, picking up the data from SFTP or S3, or downloading the data from a URI. The next step is Transform, which are the data transformations to the data to conform it, clean it, and aggregate it to its final stage. This can involve modifying the data format, converting JSON to CSV, or other modifications such as joining the dataset to other data sources. Finally, the last step is load, which refer to loading the fully transformed data into a database.

Simplified Example- ETL of Data for Ice Cream Stores

In our simplified example, we will build an ETL process for pulling and gathering data for an Ice Cream store. We will need to hit an API to extract data about ice cream stores. We need to roll the data up to a store level and store the final result in our database.

Source Data from API

[
 {
  transaction_id:"1"
  store:"10",
  date:"05/01/2020 10:05:01"
  price:100.5
 },
 {
  transaction_id:"2"
  store:"10",
  date:"05/01/2020 10:06:02"
  price:120.5
 },
]

#database #data #data-engineering #data-science

ETL Versus ELT | Explained With Examples
1.35 GEEK