How to get started with Machine Learning in about 10 minutes

How to get started with Machine Learning in about 10 minutes

With the rise of Machine Learning inside industries, the need for a tool that can help you iterate through the process quickly has become vital. Python, a rising star in Machine Learning technology, is often the first choice to bring you success.

Introduction to Machine Learning with Python

So, why Python? In my experience, Python is one of the easiest programming languages to learn. There is a need to iterate the process quickly, and the data scientist does not need to have a deep knowledge of the language, as they can get the hang of it real quick.

How easy?

for anything in the_list:

That easy. The syntax is closely related to English (or human language, not a machine). And there are no silly curly brackets that confuse humans. I have a colleague who is in Quality Assurance, not a Software Engineer, and she can write Python code within a day on production level. (For real!)

So the builders of the libraries we’ll discuss below chose Python for their choice of language. And as a Data Analyst and Scientist, we can just use their masterpieces to help us complete the tasks. These are the incredible libraries, that are must-haves for Machine Learning with Python.

  1. Numpy

The famous numerical analysis library. It will help you do many things, from computing the median of data distribution, to processing multidimensional arrays.

2. Pandas

For processing CSV files. Of course, you will need to process some tables, and see statistics, and this is the right tool you want to use.

3. Matplotlib

After you have the data stored in Pandas data frames, you might need some visualizations to understand more about the data. Images are still better than thousands of words.

4. Seaborn

This is also another visualization tool, but more focused on statistical visualization. Things like histograms, or pie charts, or curves, or maybe correlation tables.

5. Scikit-Learn

This is the final boss of Machine Learning with Python. THE SO-CALLED Machine Learning with Python is this guy. Scikit-Learn. All of the things you need from algorithms to improvements are here.

6. Tensorflow and Pytorch

I don’t talk too much about these two. But if you are interested in Deep Learning, take a look at them, it will be worth your time. (I will give another tutorial about Deep Learning next time, stay tuned!)

Python Machine Learning Projects

Of course, reading and studying alone will not bring you where you need to go. You need actual practice. As I said on my blog, learning the tools is pointless if you do not jump into the data. And so, I introduce you to a place where you can find Python Machine Learning Projects easily.

Kaggle is a platform where you can dive directly into the data. You’ll solve projects and get really good at Machine Learning. Something that might make you more interested in it: the Machine Learning competitions it holds may give a prize as big as $100,000. And you might want to try your luck. Haha.

But, the most important thing is not the money — it is really a place where you can find Machine Learning with Python Projects. There are lots of projects you can try. But if you are a newbie, and I assume you are, you will want to join this competition.

Here’s an example project that we’ll use in the below tutorial:

Titanic: Machine Learning from Disaster

Yes, the infamous Titanic. A tragic disaster in 1912, that took the lives of 1502 people from 2224 passengers and crew. This Kaggle competition (or I can say tutorial) gives you the real data about the disaster. And your task is to explain the data so that you can predict whether a personal survived or not during the incident.

Machine Learning with Python Tutorial

Before going deep into the data of the Titanic, let’s install some tools you need.

Of course, Python. You need to install it first from the Python offfical website. You need to install version 3.6+ to keep up to date with the libraries.

After that, you need to install all the libraries via Python pip. Pip should be installed automatically with the distribution of Python you just downloaded.

Then install the things you need via pip. Open your terminal, command line, or Powershell, and write the following:

pip install numpypip install pandaspip install matplotlibpip install seabornpip install scikit-learnpip install jupyter

Well, everything looks good. But wait, what is jupyter? Jupyter stands for Julia, Python, and R, hence Jupytr. But it’s an odd combo of words, so they changed it into just Jupyter. It is a famous notebook where you can write Python code interactively.

Just type jupyter notebook in your terminal and you will open a browser page like this:

Write the code inside the green rectangle and you can write and evaluate Python code interactively.

Now you have installed all the tools. Let’s get going!

Data Exploration

The first step is to explore the data. You need to download the data from the Titanic page in Kaggle. Then put the extracted data inside a folder where you start your Jupyter notebook.

Then import the necessary libraries:

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
%matplotlib inline

Then load the data:


You will see something like this:

That is our data. It has the following columns:

  1. PassengerId, the identifier of the passenger
  2. Survived, whether he/she survived or not
  3. Pclass, the class of the service, maybe 1 is economy, 2 is business, and 3 is first class
  4. Name, the name of the passenger
  5. Sex
  6. Age
  7. Sibsp, or siblings and spouses, number of siblings and spouses on board
  8. Parch, or parents and children, number of them on board
  9. Ticket, ticket detail
  10. Cabin, their cabin. NaN means unknown
  11. Embarked, the origin of embarkation, S for Southampton, Q for Queenstown, C for Cherbourg

While exploring data, we often find missing data. Let’s see them:

def missingdata(data):
    total = data.isnull().sum().sort_values(ascending = False)
    percent = (data.isnull().sum()/data.isnull().count()*100).sort_values(ascending = False)
    ms=pd.concat([total, percent], axis=1, keys=['Total', 'Percent'])
    ms= ms[ms["Percent"] > 0]
    f,ax =plt.subplots(figsize=(8,6))
    fig=sns.barplot(ms.index, ms["Percent"],color="green",alpha=0.8)
    plt.xlabel('Features', fontsize=15)
    plt.ylabel('Percent of missing values', fontsize=15)
    plt.title('Percent missing data by feature', fontsize=15)
    return ms


We will see a result like this:

The cabin, age, and embarked data has some missing values. And cabin information is largely missing. We need to do something about them. This is what we call Data Cleaning.

Data Cleaning

This is what we use 90% of the time. We will do Data Cleaning a lot for every single Machine Learning project. When the data is clean, we can easily jump ahead to the next step without worrying about anything.

The most common technique in Data Cleaning is filling missing data. You can fill the data missing with Mode, Mean, or Median. There is no absolute rule on these choices — you can try to choose one after another and see the performance. But, for a rule of thumb, you can only use mode for categorized data, and you can use median or mean for continuous data.

So let’s fill the embarkation data with Mode and the Age data with median.

train_df['Embarked'].fillna(train_df['Embarked'].mode()[0], inplace = True)
train_df['Age'].fillna(train_df['Age'].median(), inplace = True)

The next important technique is to just remove the data, especially for largely missing data. Let’s do it for the cabin data.

drop_column = ['Cabin']
train_df.drop(drop_column, axis=1, inplace = True)

Now we can check the data we have cleaned.

print('check the nan value in train data')

Perfect! No missing data found. Means the data has been cleaned.

Feature Engineering

Now we have cleaned the data. The next thing we can do is Feature Engineering.

Feature Engineering is basically a technique for finding Feature or Data from the currently available data. There are several ways to do this technique. More often, it is about common sense.

Let’s take a look at the Embarked data: it is filled with Q, S, or C. The Python library will not be able to process this, since it is only able to process numbers. So you need to do something called One Hot Vectorization, changing the column into three columns. Let’s say Embarked_Q, Embarked_S, and Embarked_C which are filled with 0 or 1 whether the person embarked from that harbor or not.

The other example is SibSp and Parch. Maybe there is nothing interesting in both of those columns, but you might want to know how big the family was of the passenger who boarded in the ship. You might assume that if the family was bigger, then the chance of survival would increase, since they could help each other. On other hand, solo people would’ve had it hard.

So you want to create another column called family size, which consists of sibsp + parch + 1 (the passenger themself).

The last example is called bin columns. It is a technique which creates ranges of values to group several things together, since you assume it is hard to differentiate things with similar value. For example, Age. For a person aged 5 and 6, is there any significant difference? or for person aged 45 and 46, is there any big difference?

That’s why we create bin columns. Maybe for age, we will create 4 bins. Children (0–14 years), Teenager (14–20), Adult (20–40), and Elders (40+)

Let’s code them:

all_data = train_df

for dataset in all_data :
    dataset['FamilySize'] = dataset['SibSp'] + dataset['Parch'] + 1

import re
# Define function to extract titles from passenger names
def get_title(name):
    title_search =' ([A-Za-z]+)\.', name)
    # If the title exists, extract and return it.
    if title_search:
    return ""
# Create a new feature Title, containing the titles of passenger names
for dataset in all_data:
    dataset['Title'] = dataset['Name'].apply(get_title)
# Group all non-common titles into one single grouping "Rare"
for dataset in all_data:
    dataset['Title'] = dataset['Title'].replace(['Lady', 'Countess','Capt', 'Col','Don', 
                                                 'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer', 'Dona'], 'Rare')

    dataset['Title'] = dataset['Title'].replace('Mlle', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Ms', 'Miss')
    dataset['Title'] = dataset['Title'].replace('Mme', 'Mrs')

for dataset in all_data:
    dataset['Age_bin'] = pd.cut(dataset['Age'], bins=[0,14,20,40,120], labels=['Children','Teenage','Adult','Elder'])

for dataset in all_data:
    dataset['Fare_bin'] = pd.cut(dataset['Fare'], bins=[0,7.91,14.45,31,120], labels ['Low_fare','median_fare', 'Average_fare','high_fare'])
for dataset in traindf:
    drop_column = ['Age','Fare','Name','Ticket']
    dataset.drop(drop_column, axis=1, inplace = True)

drop_column = ['PassengerId']
traindf.drop(drop_column, axis=1, inplace = True)
traindf = pd.get_dummies(traindf, columns = ["Sex","Title","Age_bin","Embarked","Fare_bin"],

Now, you have finished all the features. Let’s take a look into the correlation for each feature:

sns.heatmap(traindf.corr(),annot=True,cmap='RdYlGn',linewidths=0.2) #data.corr()-->correlation matrix

Correlations with value of 1 means highly correlated positively, -1 means highly correlated negatively. For example, sex male and sex female will correlate negatively, since passengers had to identify as one or the other sex. Other than that, you can see that nothing related to anything highly except for the ones created via feature engineering. This means we are good to go.

What will happen if something highly correlates with something else? We can eliminate one of them since adding nother information via a new column will not give the system new information since both of them are exactly the same.

Machine Learning with Python

Now we have arrived at the apex of the tutorial: Machine Learning modeling.

from sklearn.model_selection import train_test_split #for split the data
from sklearn.metrics import accuracy_score  #for accuracy_score
from sklearn.model_selection import KFold #for K-fold cross validation
from sklearn.model_selection import cross_val_score #score evaluation
from sklearn.model_selection import cross_val_predict #prediction
from sklearn.metrics import confusion_matrix #for confusion matrix
all_features = traindf.drop("Survived",axis=1)
Targeted_feature = traindf["Survived"]
X_train,X_test,y_train,y_test = train_test_split(all_features,Targeted_feature,test_size=0.3,random_state=42)

You can choose many algorithms included inside the scikit-learn library.

  1. Logistic Regression
  2. Random Forest
  3. SVM
  4. K Nearest Neighbor
  5. Naive Bayes
  6. Decision Trees
  7. AdaBoost
  8. LDA
  9. Gradient Boosting

You might be overwhelmed trying to figure out what is what. Don’t worry, just treat is as a black box: choose one with the best performance. (I will create a whole article on these algorithms later.)

Let’s try it with my favorite one: the Random Forest Algorithm

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(criterion='gini', n_estimators=700,
print('--------------The Accuracy of the model----------------------------')
print('The accuracy of the Random Forest Classifier is', round(accuracy_score(prediction_rm,y_test)*100,2))
kfold = KFold(n_splits=10, random_state=22) # k=10, split the data into 10 equal parts
print('The cross validated score for Random Forest Classifier is:',round(result_rm.mean()*100,2))
y_pred = cross_val_predict(model,all_features,Targeted_feature,cv=10)
plt.title('Confusion_matrix', y=1.05, size=15)

Wow! It gives us 83% accuracy. That’s good enough for our first time.

The cross validated score means a K Fold Validation method. If K = 10, it means you split the data in 10 variations and compute the mean of all scores as the final score.

Fine Tuning

Now you are done with the steps in Machine Learning with Python. But, there is one more step which can bring you better results: fine tuning. Fine tuning means finding the best parameter for Machine Learning Algorithms. If you see the code for random forest above:

model = RandomForestClassifier(criterion='gini', n_estimators=700,

There are many parameters you need to set. These are the defaults, by the way. And you can change the parameters however you want. But of course, it will takes a lot of time.

Don’t worry — there is a tool called Grid Search, which finds the optimal parameters automatically. Sounds great, right?

# Random Forest Classifier Parameters tunning 
model = RandomForestClassifier()

## Search grid for optimal parameters
param_grid = {"n_estimators" :n_estim}

model_rf = GridSearchCV(model,param_grid = param_grid, cv=5, scoring="accuracy", n_jobs= 4, verbose = 1),train_Y)

# Best score

#best estimator

Well, you can try it out for yourself. And have fun with Machine Learning.


How was it? It doesn’t seem very difficult does it? Machine Learning with Python is easy. Everything has been laid out for you. You can just do the magic. And bring happiness to people.

This piece was originally released on my blog at

30s ad

Machine Learning Guide: Learn Machine Learning Algorithms

Introduction to Machine Learning & Deep Learning in Python

Hands-On Machine Learning: Learn TensorFlow, Python, & Java!

Learn Azure Machine Learning from scratch

Master Machine Learning , Deep Learning with Python

Top Machine Learning Framework: 5 Machine Learning Frameworks of 2019

Top Machine Learning Framework: 5 Machine Learning Frameworks of 2019

Machine Learning (ML) is one of the fastest-growing technologies today. ML has a lot of frameworks to build a successful app, and so as a developer, you might be getting confused about using the right framework. Herein we have curated top 5...

Machine Learning (ML) is one of the fastest-growing technologies today. ML has a lot of frameworks to build a successful app, and so as a developer, you might be getting confused about using the right framework. Herein we have curated top 5 machine learning frameworks that are cutting edge technology in your hands.

Through the machine learning frameworks, mobile phones and tablets are getting powerful enough to run the software that can learn and react in real-time. It is a complex discipline. But the implementation of ML models is far less daunting and difficult than it used to be. Now, it automatically improves the performance with the pace of time, interactions, and experiences, and the most important acquisition of useful data pertaining to the tasks allocated.

As we know that ML is considered as a subset of Artificial Intelligence (AI). The scientific study of statistical models and algorithms help a computing system to accomplish designated tasks efficiently. Now, as a mobile app developer, when you are planning to choose machine learning frameworks you must keep the following things in mind.

The framework should be performance-oriented
The grasping and coding should be quick
It allows to distribute the computational process, the framework must have parallelization
It should consist of a facility to create models and provide a developer-friendly tool
Let’s learn about the top five machine learning frameworks to make the right choice for your next ML application development project. Before we dive deeper into these mentioned frameworks, know the different types of ML frameworks that are available on the web. Here are some ML frameworks:

Mathematical oriented
Neural networks-based
Linear algebra tools
Statistical tools
Now, let’s have an insight into ML frameworks that will help you in selecting the right framework for your ML application.

Don’t Miss Out on These 5 Machine Learning Frameworks of 2019
#1 TensorFlow
TensorFlow is an open-source software library for data-based programming across multiple tasks. The framework is based on computational graphs which is essentially a network of codes. Each node represents a mathematical operation that runs some function as simple or as complex as multivariate analysis. This framework is said to be best among all the ML libraries as it supports regressions, classifications, and neural networks like complicated tasks and algorithms.

machine learning frameworks
This machine learning library demands additional efforts while learning TensorFlow Python framework. Your job becomes easy in the n-dimensional array of the framework when you have grasped the Python frameworks and libraries.

The benefits of this framework are flexibility. TensorFlow allows non-automatic migration to newer versions. It runs on the GPU, CPU, servers, desktops, and mobile devices. It provides auto differentiation and performance. There are a few goliaths like Airbus, Twitter, IBM, who have innovatively used the TensorFlow frameworks.

#2 FireBase ML Kit
Firebase machine learning framework is a library that allows effortless, minimal code, with highly accurate, pre-trained deep models. We at Space-O Technologies use this machine learning technology for image classification and object detection. The Firebase framework offers models both locally and on the Google Cloud.

machine learning frameworks
This is one of our ML tutorials to make you understand the Firebase frameworks. First of all, we collected photos of empty glass, half watered glass, full watered glass, and targeted into the machine learning algorithms. This helped the machine to search and analyze according to the nature, behavior, and patterns of the object placed in front of it.

The first photo that we targeted through machine learning algorithms was to recognize an empty glass. Thus, the app did its analysis and search for the correct answer, we provided it with certain empty glass images prior to the experiment.
The other photo that we targeted was a half water glass. The core of the machine learning app is to assemble data and to manage it as per its analysis. It was able to recognize the image accurately because of the little bits and pieces of the glass given to it beforehand.
The last one is a full glass recognition image.
Note: For correct recognition, there has to be 1 label that carries at least 100 images of a particular object.

#3 CAFFE (Convolutional Architecture for Fast Feature Embedding)
CAFFE framework is the fastest way to apply deep neural networks. It is the best machine learning framework known for its model-Zoo a pre-trained ML model that is capable of performing a great variety of tasks. Image classification, machine vision, recommender system are some of the tasks performed easily through this ML library.

machine learning frameworks
This framework is majorly written in CPP. It can run on multiple hardware and can switch between CPU and GPU with the use of a single flag. It has systematically organized the structure of Mat lab and python interface.

Now, if you have to make a machine learning app development, then it is mainly used in academic research projects and to design startups prototypes. It is the aptest machine learning technology for research experiments and industry deployment. At a time this framework can manage 60 million pictures every day with a solitary Nvidia K40 GPU.

#4 Apache Spark
The Apache Spark machine learning is a cluster-computing framework written in different languages like Java, Scala, R, and Python. Spark’s machine learning library, MLlib is considered as foundational for the Spark’s success. Building MLlib on top of Spark makes it possible to tackle the distinct needs of a single tool instead of many disjointed ones.

machine learning frameworks
The advantages of such ML library lower learning curves, less complex development and production environments, which ultimately results in a shorter time to deliver high-performing models. The key benefit of MLlib is that it allows data scientists to solve multiple data problems in addition to their machine learning problems.

It can easily solve graph computations (via GraphX), streaming (real-time calculations), and real-time interactive query processing with Spark SQL and DataFrames. The data professionals can focus on solving the data problems instead of learning and maintaining a different tool for each scenario.

#5 Scikit-Learn
Scikit-learn is said to be one of the greatest feats of Python community. This machine learning framework efficiently handles data mining and supports multiple practical tasks. It is built on foundations like SciPy, Numpy, and matplotlib. This framework is known for supervised & unsupervised learning algorithms as well as cross-validation. The Scikit learn is largely written in Python with some core algorithms in Cython to achieve performance.

machine learning frameworks
The machine learning framework can work on multiple tasks without compromising on speed. There are some remarkable machine learning apps using this framework like Spotify, Evernote, AWeber, Inria.

With the help of machine learning to build iOS apps, Android apps powered by ML have become quite an easy process. With this emerging technology trend varieties of available data, computational processing has become cheaper and more powerful, and affordable data storage. So being an app developer or having an idea for machine learning apps should definitely dive into the niche.

Still have any query or confusion regarding ML frameworks, machine learning app development guide, the difference between Artificial Intelligence and machine learning, ML algorithms from scratch, how this technology is helpful for your business? Just fill our contact us form. Our sales representatives will get back to you shortly and resolve your queries. The consultation is absolutely free of cost.

Author Bio: This blog is written with the help of Jigar Mistry, who has over 13 years of experience in the web and mobile app development industry. He has guided to develop over 200 mobile apps and has special expertise in different mobile app categories like Uber like apps, Health and Fitness apps, On-Demand apps and Machine Learning apps. So, we took his help to write this complete guide on machine learning technology and machine app development areas.

Introduction to Machine Learning with TensorFlow.js

Introduction to Machine Learning with TensorFlow.js

Learn how to build and train Neural Networks using the most popular Machine Learning framework for javascript, TensorFlow.js.

Learn how to build and train Neural Networks using the most popular Machine Learning framework for javascript, TensorFlow.js.

This is a practical workshop where you'll learn "hands-on" by building several different applications from scratch using TensorFlow.js.

If you have ever been interested in Machine Learning, if you want to get a taste for what this exciting field has to offer, if you want to be able to talk to other Machine Learning/AI specialists in a language they understand, then this workshop is for you.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Machine Learning and TensorFlow.js

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning In Node.js With TensorFlow.js

Machine Learning in JavaScript with TensorFlow.js

A Complete Machine Learning Project Walk-Through in Python

Top 10 Machine Learning Algorithms You Should Know to Become a Data Scientist

TensorFlow Vs PyTorch: Comparison of the Machine Learning Libraries

TensorFlow Vs PyTorch: Comparison of the Machine Learning Libraries

Libraries play an important role when developers decide to work in Machine Learning or Deep Learning researches. In this article, we list down 10 comparisons between TensorFlow and PyTorch these two Machine Learning Libraries.

According to this article, a survey based on a sample of 1,616 ML developers and data scientists, for every one developer using PyTorch, there are 3.4 developers using TensorFlow. In this article, we list down 10 comparisons between these two Machine Learning Libraries

1 - Origin

PyTorch has been developed by Facebook which is based on Torch while TensorFlow, an open sourced Machine Learning Library, developed by Google Brain is based on the idea of data flow graphs for building models.

2 - Features

TensorFlow has some attracting features such as TensorBoard which serves as a great option while visualising a Machine Learning model, it also has TensorFlow Serving which is a specific grpc server that is used during the deployment of models in production. On the other hand, PyTorch has several distinguished features too such as dynamic computation graphs, naive support for Python, support for CUDA which ensures less time for running the code and increase in performance.

3 - Community

TensorFlow is adopted by many researchers of various fields like academics, business organisations, etc. It has a much bigger community than PyTorch which implies that it is easier to find for resources or solutions in TensorFlow. There is a vast amount of tutorials, codes, as well as support in TensorFlow and PyTorch, being the newcomer into play as compared to TensorFlow, it lacks these benefits.

4 - Visualisation

Visualisation plays as a protagonist while presenting any project in an organisation. TensorFlow has TensorBoard for visualising Machine Learning models which helps during training the model and spot the errors quickly. It is a real-time representation of the graphs of a model which not only depicts the graphic representation but also shows the accuracy graphs in real-time. This eye-catching feature is lacked by PyTorch.

5 - Defining Computational Graphs

In TensorFlow, defining computational graph is a lengthy process as you have to build and run the computations within sessions. Also, you will have to use other parameters such as placeholders, variable scoping, etc. On the other hand, Python wins this point as it has the dynamic computation graphs which help id building the graphs dynamically. Here, the graph is built at every point of execution and you can manipulate the graph at run-time.

6 - Debugging

PyTorch being the dynamic computational process, the debugging process is a painless method. You can easily use Python debugging tools like pdb or ipdb, etc. for instance, you can put “pdb.set_trace()” at any line of code and then proceed for executions of further computations, pinpoint the cause of the errors, etc. While, for TensorFlow you have to use the TensorFlow debugger tool, tfdbg which lets you view the internal structure and states of running TensorFlow graphs during training and inference.

7 - Deployment

For now, deployment in TensorFlow is much more supportive as compared to PyTorch. It has the advantage of TensorFlow Serving which is a flexible, high-performance serving system for deploying Machine Learning models, designed for production environments. However, in PyTorch, you can use the Microframework for Python, Flask for deployment of models.

8 - Documentation

The documentation of both frameworks is broadly available as there are examples and tutorials in abundance for both the libraries. You can say, it is a tie between both the frameworks.

Click here for TensorFlow documentation and click here for PyTorch documentation.

9 - Serialisation

The serialisation in TensorFlow can be said as one of the advantages for this framework users. Here, you can save your entire graph as a protocol buffer and then later it can be loaded in other supported languages, however, PyTorch lacks this feature. 

10 - Device Management

By default, Tensorflow maps nearly all of the GPU memory of all GPUs visible to the process which is a comedown but here it automatically presumes that you want to run your code on the GPU because of the well-set defaults and thus result in fair management of the device. On the other hand, PyTorch keeps track of the currently selected GPU and all the CUDA tensors which will be allocated.