Publish Cloud Dataprep Profile Results to BigQuery

Publish Cloud Dataprep Profile Results to BigQuery

This article describes how to use webhooks and Cloud Functions to automatically publish Dataprep-generated profile information into BigQuery (after making an intermediate stop in GCS).

Use Case

For data governance purposes, customers often want to store the profile metadata generated by Cloud Dataprep Premium when jobs are run. In this scenario, a customer wants to retain the profiling metadata in BigQuery for reporting purposes.

This article describes how to use webhooks and Cloud Functions to automatically publish Dataprep-generated profile information into BigQuery (after making an intermediate stop in GCS).

We will build the following automated process:

  1. Run a Cloud Dataprep job with profiling enabled.
  2. In Cloud Dataprep, invoke a webhook that calls a Cloud Function.
  3. The Cloud Function calls the GET profile results API.
  4. The Cloud Function saves the API response to GCS.
  5. The Cloud Function triggers a separate Cloud Dataprep job to process the JSON API response and publish a BigQuery table.

If you don’t already have access to Cloud Dataprep Premium, and you want to try this yourself, [you can sign up here_](https://console.cloud.google.com/marketplace/product/endpoints/cloud-dataprep-editions-v2?utm_source=trifacta&utm_medium=mediumcontent&utm_campaign=connormedium)._

Step-by-step instructions

Step 1: Understand the API output containing the profile metadata

Whenever you run a job with profiling enabled, Cloud Dataprep generates metadata about the profiling results. There are three types of profile metadata information that Cloud Dataprep will output:

  1. profilerRules: Contains information about each DQ rule and the number of passing and failing rows for each rule.
  2. profilerTypeCheckHistograms: Contains information about the number of missing, mismatched, and valid records for each column in your dataset.
  3. profilerValidValueHistograms: Contains information about min/max/median values for numeric or date columns, and the top 20 unique values by count for string columns.

These profile results appear in the Cloud Dataprep UI, and can also be retrieved through an API call. In order to publish the profile metadata to BigQuery, you will need to make an API call to return the JSON representation of the profile information.

You can read about the API call at this link: https://api.trifacta.com/dataprep-premium/index.html#operation/getProfilingInformationForJobGroup

dataprep bigquery data-quality data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

How to Fix Your Data Quality Problem

Data quality is top of mind for every data professional — and for good reason. Bad data costs companies valuable time, resources, and most of all, revenue.

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Data Science Course in Dallas

Become a data analysis expert using the R programming language in this [data science](https://360digitmg.com/usa/data-science-using-python-and-r-programming-in-dallas "data science") certification training in Dallas, TX. You will master data...

32 Data Sets to Uplift your Skills in Data Science | Data Sets

Need a data set to practice with? Data Science Dojo has created an archive of 32 data sets for you to use to practice and improve your skills as a data scientist.