A Tutorial For Iterating Over Automatically Introspected Database Objects

For a recent data migration project that utilized Airflow, I needed to connect to a database and automatically introspect its schemas and tables. One database that needed to be introspected had hundreds of schemas. And each schema had dozens of varying tables, most of which had different columns and column constraints.

In other words, we had database metadata made of of tens- or hundreds of thousands of objects that no one would manually type out into config files or hard-coded Python lists. Knowing that Airflow uses the SQLAlchemy module to communicate with databases as dialect- and engine-agnostically as possible, I had a hunch I could leverage Airflow’s SQLAlchemy connection to the database to generate the lists of schemas, tables, columns, or whatever database objects I needed to access.

My hunch was right! SQLAlchemy and Airflow’s DB API hook make it easy to get a hold of the SQLAlchemy engine for any connected database.

#sqlalchemy #database #introspect #airflow #introspecting databases with airflow and sqlalchemy #sqlalchemy

Introspecting Databases with Airflow and SQLAlchemy
1.20 GEEK