Preparing your data before using it to train or test the machine learning model is really important to get accurate and precise results. Preparing the data can be a tiresome task because it takes a lot of effort and time to analyze the data and prepare it according to our requirements.

Dataprep is an open-source python library that allows you to prepare your data and that too with just a few lines of code. By preparing data it means that we can analyze the properties of the attributes that are there in the data. In the current version of DataPrep, they have a very useful module named EDA(Exploratory Data Analysis).

In this article, we will explore what all we can do using DataPrep with using its features.


Implementation of DataPrep

Like any other python library, we need to install DataPrep using pip install dataprep

  1. Importing required libraries

DataPrep contains different functions for different operations. We will start by importing the plot function which is used to visualize the statistical plots and properties of the dataset. Also, we will be importing plotly express as we will use it to download the dataset we will be working on.


import plotly.express as px

from dataprep.eda import plot

  1. Loading the Dataset

In this article, we will be using the sample dataset named ‘tips’ which can be downloaded using plotly express. The dataset contains certain attributes related to hotel bills and tips.

df = px.data.tips() 

df



#developers corner #data analysis #data analytics #data preparation #dataprep #data analysis

A Python Library to Prepare Your Data Before Training
2.75 GEEK