This article describes how to use webhooks and Cloud Functions to automatically publish Dataprep-generated profile information into BigQuery (after making an intermediate stop in GCS).
Data Observability: How to Prevent Broken Data Pipelines. The relationship between data downtime, observability, and reliable insights
Why Data Management remains a challenge in the Data and AI-first era. What challenges companies face with data management and how to begin tackling them
In this post, we are going to look at ML Ops, a recent development in ML that bridges the gap between ML and traditional software engineering, and highlight how data quality is key to ML Ops workflows in order to accelerate data teams and maintain trust in your data.
Data Observability: The Next Frontier of Data Engineering. Introducing a better approach to building data pipelines
Bad data makes data scientists work harder, not smarter! Data is one of the most important key factors (besides all the technical depth around an ML solution) that dictate whether.
We’ll specifically talk about data preparation as the most critical challenge and how an ML-based data preparation tool or software can make it easier to process data in the data lake.
In this article I’m going to walk you through how you can scrape a table embedded in a PDF file, unit test that data using Great Expectations and then if valid, save the file in S3 on AWS.
3 key takeaways from the 2020 Chief Data Officer Symposium. When COVID-19 hit the U.S. in March 2020, the first role hired for by the Center for Disease Control (CDC) was a Chief Data Officer. Not surprisingly, COVID-19 and GDPR, it’s close cousin in terms of cultural relevance, were top of mind for everyone at this year’s all-virtual MIT Chief Data Officer Symposium. In this article, I share three key takeaways from the event and propose next steps for CDOs to retain their competitive edge in this remote-first, data-obsessed world.
Quality Assurance Testing is one of the key areas in Bigdata. Data quality issues may ruin the success of many Big Data, data lake, ETL projects.
I’ve developed an open-source data testing and a quality tool called data-flare. It aims to help data engineers and data scientists assure the data quality of large datasets using Spark.
How to improve data quality for Machine Learning? The ultimate goal of every data scientist or Machine Learning evangelist is to create a better model with higher predictive accuracy.
Introducing a better way to measure the financial impact of your bad data. In addition to wasted time and sleepless nights, data quality issues lead to compliance risks.
There are a few simple data quality checks you can build in your Data Warehouse process to detect data inconsistencies due to errors within your ETL/ELT pipelines or connection failures.
Tired of Data Quality Discussions? Five steps to avoid these discussions
Data quality is top of mind for every data professional — and for good reason. Bad data costs companies valuable time, resources, and most of all, revenue.
History says the 16th century was the time during which the rise of Western civilization occurred. During this time, Spain and Portugal explored the Indian Ocean and opened worldwide oceanic trade routes, and Vasco da Gama was given permission by the Indian Sultans to settle in the wealthy Bengal Sultanate. Large parts of the New World became Spanish and Portuguese colonies
Hardly anyone relying on data can say their data is perfect. There is always that difference between the dataset you have and the dataset you wish you had. This difference is what Data Quality is all about.
Learn why marketing analytics often fails marketers and how data scientists can fix the problem.Industry insiders have always claimed that the Great Recession was a good thing for marketing analytics.
Have you ever asked yourself why Data Governance has a huge impact in your Machine Learning Models? Let me explain to you in 5 minutes.