Building a Data Lake with AWS

Learn about the benefits of data lakes and how to set them up quickly with AWS Lake Formation

Every day, big and small companies collect more and more data. Enterprises typically gather data about companies’ operations, clients, competition, products etc. They need to store, process and analyze all this information in an efficient manner.

The traditional solution of setting up warehouses and databases is simply not up to the task of satisfying the companies’ needs as they deal with very large amounts of data. These solutions also don’t facilitate the usage of analytics or machine learning techniques that have become very popular in recent years.

The problems with traditional warehouses initially led to the development of cloud storage and the cloud computing technologies. This has further led to the development of the concept of a data lake.

In this tutorial, you will learn what data lakes are and how to set up one with AWS.

What is a data lake?

The term data lake was first used in 2010 by James Dixon and he used these words to describe it.

‘If you think of a data mart as a store of bottled water, cleansed and packaged and structured for easy consumption, the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in or take samples.’

What does that mean in terms of storing and analyzing data?

Data lakes are essentially repositories that store all sorts of data: structured (rows and columns) semi-structured (XML, JSON etc) and unstructured (text documents etc). They also include all types of files: photos, videos and audio files. This means there is one centralized location where all company data can be accessed, viewed and analyzed.

#data-science #artificial-intelligence #machine-learning #programming #aws

pub.towardsai.net

Building a Data Lake with AWS