Introduction to Schema: A Python Libary to Validate your Data

Motivation

Your script works with the training data, but when you use that script for a new but supposedly similar data, you run into an error. What is going on? That might be because the structure of your data is not like what you expected.

But it might be difficult for you to take a look at every row of your new data to find out where the problem could be. It can also be time-consuming to manually analyze your data every time the new data is used.

It is even worse if your code does not throw any error but the data changes. As the result, the **performance of your model **might get worse because the data is different from what you expected.

If we can write a test for functions with tools such as Pytest, is there a way to can write a test for data as well?

We can do that with schema. This article will show you how to use schema in a variety of scenarios.

#data-analysis #data-science #data-management #data-analytics #python

Motivation

towardsdatascience.com

Introduction to Schema: A Python Libary to Validate your Data