I’m a big fan of dbt — an open source project that helps me build data pipelines around BigQuery using only SQL.

Get started with BigQuery and dbt

There’s a lot already writen about Bigquery and dbt. For example, there’s this official tutorial to set up dbt with BigQuery, with a lot more details than I do here (thanks Claire Carroll). The goal of this post is to share with you some GCP secrets to make the installation as easy as possible.

Step 1: Create a free Google Cloud account

Good news: You don’t need a credit card to have your own Google Cloud account. You’ll get a free terabyte of queries in BigQuery every month, and also a free shell environment you can use through your browser.

While creating your account, you’ll also create your first Google Cloud project. We’ll use the id of it later.

BigQuery without a credit card: Discover, learn and share

towardsdatascience.com

Step 2: Welcome to the free Cloud Shell

On console.cloud.google.com click on the “cloud shell” icon on top, and a shell environment will be ready for your use in a minute or so:

Image for post

Find the Cloud Shell button

Image for post

A cloud shell will open

Step 3: pip3 install dbt

Once in the cloud shell, installing dbt is really easy. To avoid problems skip installing the full dbt, but just install the dbt-bigquery parts with:

$ pip3 install --user --upgrade dbt-bigquery

Notes:

  • pip3 instead of pip, to make sure we are on the Python 3 world.
  • --user to avoid installing at the root level.
  • --upgrade just in case an older version of dbt was installed.

Step 3.1: debug

If you get an error like /usr/local/bin/python3.7: bad interpreter: No such file or directory, uninstall dbt and reinstall.

$ pip3 uninstall dbt-core
$ pip3 install --user --upgrade dbt-bigquery

Step 4: start your first dbt project

$ ~/.local/bin/dbt init first_project

Notes:

  • ~/.local/bin/ is the path to the just installed dbt. Consider adding this directory to the PATH env, to avoid the need of prepending it.

Step 4.1: try to run dbt

$ cd first_project/
$ ~/.local/bin/dbt run

That could have run! But it didn’t. We got an error message like Credentials in profile “default”, target “dev” invalid: Runtime Error instead. That means we still need to configure a way for dbt to connect and authenticate to BigQuery. The good news: This is really easy, since we are already in a Google Cloud environment.

#data-science #bigquery #google-cloud-platform #data-engineering #dbt #data analysis

Get started with BigQuery and dbt, the easy way
10.30 GEEK