Publish Cloud Dataprep Profile Results to BigQuery

This article describes how to use webhooks and Cloud Functions to automatically publish Dataprep-generated profile information into BigQuery (after making an intermediate stop in GCS).

Data Observability: How to Prevent Broken Data Pipelines

Data Observability: How to Prevent Broken Data Pipelines. The relationship between data downtime, observability, and reliable insights

Why Data Management remains a challenge in the Data and AI-first era

Why Data Management remains a challenge in the Data and AI-first era. What challenges companies face with data management and how to begin tackling them

Why Data Quality is Key to Successful ML Ops

In this post, we are going to look at ML Ops, a recent development in ML that bridges the gap between ML and traditional software engineering, and highlight how data quality is key to ML Ops workflows in order to accelerate data teams and maintain trust in your data.

Data Observability: The Next Frontier of Data Engineering

Data Observability: The Next Frontier of Data Engineering. Introducing a better approach to building data pipelines

The cost of poor data quality

Bad data makes data scientists work harder, not smarter! Data is one of the most important key factors (besides all the technical depth around an ML solution) that dictate whether.

Data Preparation: The Case for Using Automated, ML-Based Tools

We’ll specifically talk about data preparation as the most critical challenge and how an ML-based data preparation tool or software can make it easier to process data in the data lake.

Scraping a table in a PDF, reliably and then test data quality

In this article I’m going to walk you through how you can scrape a table embedded in a PDF file, unit test that data using Great Expectations and then if valid, save the file in S3 on AWS.

What Does it Take to Succeed as a CDO in the Age of COVID-19?

3 key takeaways from the 2020 Chief Data Officer Symposium. When COVID-19 hit the U.S. in March 2020, the first role hired for by the Center for Disease Control (CDC) was a Chief Data Officer. Not surprisingly, COVID-19 and GDPR, it’s close cousin in terms of cultural relevance, were top of mind for everyone at this year’s all-virtual MIT Chief Data Officer Symposium. In this article, I share three key takeaways from the event and propose next steps for CDOs to retain their competitive edge in this remote-first, data-obsessed world.

Data Validation Framework in Apache Spark

Quality Assurance Testing is one of the key areas in Bigdata. Data quality issues may ruin the success of many Big Data, data lake, ETL projects.

Why I Built an Opensource Tool for Big Data Testing

I’ve developed an open-source data testing and a quality tool called data-flare. It aims to help data engineers and data scientists assure the data quality of large datasets using Spark.

How to improve data quality for Machine Learning?

How to improve data quality for Machine Learning? The ultimate goal of every data scientist or Machine Learning evangelist is to create a better model with higher predictive accuracy.

How to Calculate the Cost of Data Downtime

Introducing a better way to measure the financial impact of your bad data. In addition to wasted time and sleepless nights, data quality issues lead to compliance risks.

Data Warehouse Quality Matters

There are a few simple data quality checks you can build in your Data Warehouse process to detect data inconsistencies due to errors within your ETL/ELT pipelines or connection failures.

Tired of Data Quality Discussions?

Tired of Data Quality Discussions? Five steps to avoid these discussions

How to Fix Your Data Quality Problem

Data quality is top of mind for every data professional — and for good reason. Bad data costs companies valuable time, resources, and most of all, revenue.

Quality Data Drives the success of Machine Learning and Artificial Intelligence

History says the 16th century was the time during which the rise of Western civilization occurred. During this time, Spain and Portugal explored the Indian Ocean and opened worldwide oceanic trade routes, and Vasco da Gama was given permission by the Indian Sultans to settle in the wealthy Bengal Sultanate. Large parts of the New World became Spanish and Portuguese colonies

How can AI help to make Enterprise Data Quality smarter?

Hardly anyone relying on data can say their data is perfect. There is always that difference between the dataset you have and the dataset you wish you had. This difference is what Data Quality is all about.

How Marketing Analytics Became Snake Oil

Learn why marketing analytics often fails marketers and how data scientists can fix the problem.Industry insiders have always claimed that the Great Recession was a good thing for marketing analytics.

Data Governance for Data Scientists?

Have you ever asked yourself why Data Governance has a huge impact in your Machine Learning Models? Let me explain to you in 5 minutes.