Learn how to detect, capture, and propagate changes in source databases to target systems in a real-time, event-driven manner with Change Data Capture (CDC). Event-Driven Change Data Capture: Introduction, Use Cases, and Tools.
This post serves as an introduction to the Change Data Capture (CDC) practice, rather than a deep-dive on a particular tool. First, I will explore the motivation behind CDC and illustrate the components of a real-time event-driven CDC system. The latter parts discuss some potential use cases where CDC is applicable and conclude with some open-source tools available in the market
Applications start with a small data footprint. Initially, a single database fulfills every data need of the application.
When applications evolve, they need to support different data models and data access patterns. For example, they might need a search index to perform full-text searches, a cache to speed up the reads, and a data warehouse for complex analytics on data.
Eventually, that simple architecture evolves into something like this.
Practically speaking, no one database can satisfy all those needs simultaneously. Consequently, applications have to use different data storage technologies such as indexes, caches, and warehouses together in their architecture. That forces them to keep their data in multiple places, in a redundant and denormalised manner.
A data lake is totally different from a data warehouse in terms of structure and function. Here is a truly quick explanation of "Data Lake vs Data Warehouse".
Understand how data changes in a fast growing company makes working with data challenging. In the last article, we looked at how users view data and the challenges they face while using data.
Understanding how users view data and their pain points when using data. In this article, I would like to share some of the things that I have learnt while managing terabytes of data in a fintech company.
Data Observability: How to Prevent Broken Data Pipelines. The relationship between data downtime, observability, and reliable insights
Intro to Data Engineering for Data Scientists: An overview of data infrastructure which is frequently asked during interviews