I’m a big fan of dbt — an open source project that helps me build data pipelines around BigQuery using only SQL.
There’s a lot already writen about Bigquery and dbt. For example, there’s this official tutorial to set up dbt with BigQuery, with a lot more details than I do here (thanks Claire Carroll). The goal of this post is to share with you some GCP secrets to make the installation as easy as possible.
Good news: You don’t need a credit card to have your own Google Cloud account. You’ll get a free terabyte of queries in BigQuery every month, and also a free shell environment you can use through your browser.
While creating your account, you’ll also create your first Google Cloud project. We’ll use the id of it later.
On console.cloud.google.com click on the “cloud shell” icon on top, and a shell environment will be ready for your use in a minute or so:
Find the Cloud Shell button
A cloud shell will open
Once in the cloud shell, installing dbt is really easy. To avoid problems skip installing the full dbt, but just install the dbt-bigquery parts with:
$ pip3 install --user --upgrade dbt-bigquery
Notes:
pip3
instead of pip
, to make sure we are on the Python 3 world.--user
to avoid installing at the root level.--upgrade
just in case an older version of dbt was installed.If you get an error like /usr/local/bin/python3.7: bad interpreter: No such file or directory
, uninstall dbt and reinstall.
$ pip3 uninstall dbt-core
$ pip3 install --user --upgrade dbt-bigquery
$ ~/.local/bin/dbt init first_project
Notes:
~/.local/bin/
is the path to the just installed dbt. Consider adding this directory to the PATH env, to avoid the need of prepending it.$ cd first_project/
$ ~/.local/bin/dbt run
That could have run! But it didn’t. We got an error message like Credentials in profile “default”, target “dev” invalid: Runtime Error
instead. That means we still need to configure a way for dbt to connect and authenticate to BigQuery. The good news: This is really easy, since we are already in a Google Cloud environment.
#data-science #bigquery #google-cloud-platform #data-engineering #dbt #data analysis