In terms of technical ingenuity, Google BigQuery is probably the most impressive data warehouse on the market. BigQuery differs from other data warehouses in that the underlying architecture is shared by everybody on Google Cloud, meaning you don’t need to pay for a dedicated cluster of expensive servers to occasionally run queries for large-scale data analysis.
Moving data into data warehouses traditionally involves dumping a shit ton of unstructured or semi-structured files into storage such as S3, Google Storage, or data lakes before loading them to their destination. It’s no surprise that generations inspired by MapReduce perpetuate outdated processes in technology, as fans of Java typically do (don’t @ me). Google upheld this status quo with BigQuery, which we painfully worked through in a previous tutorial. Luckily there are mysterious heroes among us, known only as “third-party developers.”
#etl #bigquery #python #data-engineering #google-cloud-platform