Open-source data integration is not new. It started 16 years ago with Talend. But since then, the whole industry has changed. The likes of Snowflake, BigQuery, and Redshift have changed how data is being hosted, managed, and accessed, while making it easier and a lot cheaper. But the data integration industry has evolved as well.

On one hand, new open-source projects emerged, such as Singer.io in 2017. This enabled more data integration connectors to become accessible to more teams, even though it still required a significant amount of manual work.

On the other hand, data integration was made accessible to more teams (analysts, scientists, business intelligence teams). Indeed, companies like Fivetran benefited from Snowflake’s rise, empowering non-engineering teams to set up and manage their data integration connectors by themselves, so they can use and work on the data in an autonomous way.

But even with this progress, a large majority of teams still build their own connectors in-house. The build vs. buy leans strongly on the build. That’s why we think it’s time to have a fresh new look at the landscape of the open-source technologies around data integration.

However, the idea for this article came from an awesome debate on DBT’s Slack last week. The discussion centered around two things:

  • The state of open-source alternatives to Fivetran, and
  • Whether an open-source (OSS) approach is more relevant than a commercial software approach in addressing the data integration problem.

Even Fivetran’s CEO was involved in the debate.

We already synthetized the second point in a previous article. In this article, we want to analyze the first point: the landscape of open-source data integration technologies.

TL;DR

Here is a table summarizing our analysis.

In orange is what we’re currently building at Airbyte in the next few weeks.

To better understand this table, we invite you to read below the details of our analysis on the landscape.

#etl #data #data-integration #open-source #data-science

Open-Source Data Integration and ETL in 2020
1.50 GEEK