Photo by Alvaro Reyes on Unsplash
In this article, you’ll learn some of the most helpful Pandas tricks to speed up your data analysis.
Please check out my Github repo for the source code.
Here are the data types of the Titanic DataFrame
df.dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
Let’s say you need to select the numeric columns.
df.select_dtypes(include='number').head()
This includes both int and float columns. You could also use this method to
# select just object columns
df.select_dtypes(include='object')
# select multiple data types
df.select_dtypes(include=['int', 'datetime', 'object'])
# exclude certain data types
df.select_dtypes(exclude='int')
There are two methods to convert a string into numbers in Pandas:
astype()
methodto_numeric()
methodLet’s create an example DataFrame to have a look at the difference.
df = pd.DataFrame({ 'product': ['A','B','C','D'],
'price': ['10','20','30','40'],
'sales': ['20','-','60','-']
})
The price and sales columns are stored as strings and so result in object columns:
df.dtypes
product object
price object
sales object
dtype: object
We can use the first method astype()
to perform the conversion on the price column as follows
# Use Python type
df['price'] = df['price'].astype(int)
# alternatively, pass { col: dtype }
df = df.astype({'price': 'int'})
However, this would have resulted in an error if we tried to use it on the sales column. To fix that, we can use to_numeric()
with argument errors='coerce'
df['sales'] = pd.to_numeric(df['sales'], errors='coerce')
Now, invalid values -
get converted into NaN
and the data type is float.
#python #machine-learning #data-science #pandas #pandas-dataframe