In computer programming, a class is a blueprint for a user-defined data type. Classes are defined in terms of attributes (data) and methods (functions). These data structures are a great way to organize data and methods such that they are easy to reuse and extend in the future. In this post, we will define a python class that will allow us to generate simple summary statistics and perform some EDA on data.

Let’s get started!

For our purposes we will be working with the_ FIFA 19_ data set which can be found here.

To start, let’s import the pandas package:

import pandas as pd

Next, let’s set the maximum number of display columns and rows to ‘None’:

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Now, let’s read in our data:

df = pd.read_csv('fifa_data.csv')

Next we will print the first five rows of data to get an idea of the column types and their values (column results are truncated):


Image for post

#programming #python #data-science #software-engineering

Python Classes for Statistics with Pandas
2.35 GEEK