What is Data Lineage?

Data Lineage is defined as the life cycle of the data. Data Lineage shows the complete data flow from origin to destination. Data lineage is the process of understanding, documenting, and visualizing the data from its origin to its consumption. This life cycle includes all the transformation done on the dataset from its origin to destination. Data lineage gives a better understanding to the user of what happened to the data throughout the life cycle also.

It also enables companies to trace the errors, implementing changes in the process, and implementing system migration to save time and resources for efficiency. Another process to data lineage combines data discovery and the use of a Data Catalog that captures data asset metadata with a data mapping framework.

Data Lineage helps the user to make sure if the data is coming from a reliable data source, transformations are done appropriately and loaded correctly to the designated location. Data Lineage plays an important role where key decisions rely on accurate information. Without appropriate technology and processes in place tracking, data can be virtually impossible or at the very least a costly and time-consuming endeavor.

Data lineage enables the tracking of the data stream from both endpoints to ensure the data is accurate and consistent. It allows the user to look for the data in both directions (forward and backward) between origin to destination of the data.

Data Lineage provides us the answers for any specific dataset such as:

  • Who created the data?
  • What information does the data contain?
  • Where is the data located?
  • When was the data created?
  • Why does the data exist?

We will discuss these questions in a later section.

#insights #data visualization

Data Lineage Overview, Best Practices and Techniques
5.10 GEEK