Pandas is a widely-used data analysis and manipulation library for Python. It provides numerous functions and methods that expedite the data analysis and preprocessing steps.

Due to its popularity, there are lots of articles and tutorials about Pandas. This one will be one of them but heavily focusing on the practical side. I will do examples on a  customer churn dataset that is available on Kaggle.

The examples will cover almost all the functions and methods you are likely to use in a typical data analysis process.

Let’s start by reading the csv file into a pandas dataframe.

import numpy as np
import pandas as pd

df = pd.read_csv("/content/churn.csv")
df.shape
(1000,14)
df.columns
Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography', 'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard','IsActiveMember','EstimatedSalary', 'Exited'], dtype='object')

1. Dropping columns

The drop function is used to drop columns and rows. We pass the labels of rows or columns to be dropped.

df.drop(['RowNumber', 'CustomerId', 'Surname', 'CreditScore'], axis=1, inplace=True)

df.shape
(10000,10)

The axis parameter is set as 1 to drop columns and 0 for rows. The inplace parameter is set as True to save the changes. We dropped 4 columns so the number of columns reduced to 10 from 14.

2. Select particular columns while reading

We can read only some of the columns from the csv file. The list of columns is passed to the usecols parameter while reading. It is better than dropping later on if you know the column names beforehand.

df_spec = pd.read_csv("/content/churn.csv", usecols=['Gender', 'Age', 'Tenure', 'Balance'])

df_spec.head()

Image for post

(image by author)

#artificial-intelligence #data-science #programming #machine-learning #data-analysis

30 Examples to Master Pandas
65.15 GEEK