In this article, you’ll learn how to work with Excel/CSV files in a Python environment to clean and transform raw data into a more ingestible format. This is typically useful for data integration.

This example will touch on many common ETL operations such as filter,** reduce, explode**,and flatten.


The code for these examples is available publicly on GitHub here, along with descriptions that mirror the information I’ll walk you through.

These samples rely on two open source Python packages:

  • pandas: a widely used open source data analysis and manipulation tool. More info on their  site and  PyPi.
  • gluestick: a small open source Python package containing util functions for ETL maintained by the  hotglue team. More info on  PyPi and  GitHub.

Without further ado, let’s dive in!


This example leverages sample Quickbooks data from the Quickbooks Sandbox environment, and was initially created in a  hotglue environment — a light-weight data integration tool for startups.

#python #etl #data-integration #b2b

How to write ETL operations in Python
9.35 GEEK