A hands-on guide to mastering the first baby steps in building Machine Learning applications.
source: Pixabay
Machine Learning is continuously evolving. Along with that evolution comes a spike in demand and importance. Corporations and startups are needing Data Scientists and Machine Learning Engineers now more than ever to turn those troves of data into useful wisdom. There’s probably not a better time than now (except 5 years ago) to delve into Machine Learning. And of course, there’s not a better tool to develop those applications than Python. Python has a vibrant and active community. Many of its developers came from the scientific community, thus providing Python with vast numbers of libraries for scientific computing.
In this article, we will discuss some of the features of Python’s key scientific libraries and also employ them in a proper Data Analysis and Machine Learning workflow.
What you will learn;
If you’d like to go straight to code, it is here on GitHub.
Disclaimer: This article assumes that
Pandas is an extraordinary tool for data analysis built to become the most powerful and flexible open-source data analysis/manipulation tool available in any language. Let’s take a look at what Pandas is capable of:
import numpy as np
import pandas as pd
from sklearn import datasets
iris_data = datasets.load_iris()
iris_data.keys()
## THIS IS THE OUTPUT FOR 'iris_data.keys()'
dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])
iris_data['target_names']
## THIS IS THE OUTPUT FOR 'iris_data['target_names']'
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
df_data = pd.DataFrame(iris_data['data'], columns=iris_data['feature_names'])
df_target = pd.DataFrame(iris_data['target'], columns=['species'])
df = pd.concat([df_data, df_target], axis=1)
df
In the above cells, you’d notice that I have imported the classic dataset, the Iris dataset, using Scikit-learn (we’ll explore Scikit-learn later on). I then passed the data into a Pandas DataFrame while also including the column headers. I also created another DataFrame to contain the iris species which were code-named 0
for setosa
, 1
for versicolor
, and 2
for virginica
. The final step was to concatenate the two DataFrames into a single DataFrame.
When working with data that can fit on a single machine, Pandas is the ultimate tool. It’s more like Excel but on steroids. Just like Excel, the basic units of operations are rows and columns where columns of data are Series, and a collection of Series is the DataFrame.
#machine-learning #data-science #data-analysis #python