Pandas is the definitive library for performing data analysis with Python. It was originally developed by a company called AQR Capital Management but was open-sourced for general use in 2009.

It rapidly became the go-to tool for data analysis for Python users and now has a huge array of features for data extraction, manipulation, visualisation and analysis.

Pandas has many useful methods and functions here are ten things you might not know about the library.

Pandas can be pip installed if you don’t already have it. The full documentation, with some excellent general data analysis tutorials, can be found here.

pip install pandas

Throughout the article, I will provide code examples using the ‘autos’ data set which consists of a variety of characteristics of a car and its corresponding insurance risk rating. This data set is typically used as a machine learning classification task where the objective is to predict the risk rating of the car.

Data analysis is an important preliminary step before building a machine learning model.

If you have Scikit-learn the data set can be imported using the code below. Alternatively, it can be downloaded here.

import pandas as pd
from sklearn.datasets import fetch_openml

X,y = fetch_openml("autos", version=1, as_frame=True, return_X_y=True)
data = X
data['target'] = y
data.head()

#artificial-intelligence #education #data-science #programming #pandas

10 Things You Didn’t Know About Pandas
2.00 GEEK