The history of data storage starts back in the 1950s when punch cards were used for storing data generated by computers. A lot has changed since then and this article will cover one of the latest trends in the industry, Lakehouse.

Before we jump into what’s Lakehouse and how you can benefit from it, let’s have a quick overview of two data management paradigms widely used nowadays.

Data Warehouse

The architecture for Data Warehouses was developed in the 1980s to support companies in their decision-making process. The central concept relies on having historical data processed and stored in both formats, aggregated and granular.

Aggregated data contain high-level information, summarized by groups and displaying measures such as totals, averages, or sums; granular data contain information at the lowest level of detail that is relevant for the business analysis.

This data is then consumed by BI tools, where executives and other staff can visualize and analyze data in the format of reports and charts.

Data Lake

With the advent of big data, traditional architectures like the data warehouse had to be rethought. With data coming from different sources, in different formats, and usually in a bigger volume, a new paradigm needed to emerge to fill this gap. In a data lake, the data is stored in its raw format and it’s only queried when a business question arises, retrieving relevant data that can then be analyzed to help answer the question. The data is stored in cloud storage like Amazon S3, which has become one of the largest and most cost-effective storage systems in the world as it makes it possible to store practically limitless amounts of data in its native format at a low cost.

#data-engineering #data-warehouse #data-lake #database #data-science

Lakehouse and the evolution of Data Lake
3.10 GEEK