When working with data science and machine learning projects, we will have to spend a lot of time analyzing the data and performing data preprocessing activities to clean the dataset. Pandas is undoubtedly the most widely-used open-source library for data science and analysis, mostly preferred for ad-hoc data manipulation operations. It is very likely that the dataset we use might contain missing data , null values or duplicate data for which we would like to modify the data accordingly , or we might just want to drop the column because we think that the feature is not important for creating the model.
In my last blog, I already discussed dropna() and fillna() functions in Pandas, which can be used to deal with the missing data or NaN values. As a continuation to that, I want to discuss two other powerful in-built functions in Pandas, drop() and drop_duplicates() which are widely used for data preprocessing activities, in this blog.
Let’s begin by importing the Pandas library.
Pandas drop() function is used for removing or dropping required rows and/or columns from dataframe.
Syntax:
The definition of the parameters in the syntax are as follows:
#pandas #drop-pandas #data-preprocessing #data-analysis #data-science