Build a Deep Learning model in 15 minutes

Build a Deep Learning model in 15 minutes

Basic python knowledge is all that is required to get started. You can spend the rest of the year exploring the intricacies of machine learning, if your machine refuses to learn.

  • mize your algorithm and architecture, while still benefiting from everything else.
  • Pipelines avoid information leaks between train and test sets, and one pipeline allows experimentation with many different estimators. A disk based pipeline is available if you exceed your machines available RAM.
  • Transformers standardize advanced feature engineering. For example, convert an American first name to its statistical age or gender using US Census data. Extract the geographic area code from a free form phone number string. Common date, time and string operations are supported efficiently through pandas.
  • Encoders offer robust input to your estimators, and avoid common problems with missing and long tail values. They are well tested to save you from garbage in/garbage out.
  • IO connections are configured and pooled in a standard way across the app for popular (no)sql databases, with transaction management and read write optimizations for bulk data, rather than typical ORM single row operations. Connections share a configurable query cache, in addition to encrypted S3 buckets for distributing models and datasets.
  • Dependency Management for each individual app in development, that can be 100% replicated to production. No manual activation, or magic env vars, or hidden files that break python for everything else. No knowledge required of venv, pyenv, pyvenv, virtualenv, virtualenvwrapper, pipenv, conda. Ain’t nobody got time for that.
  • Tests for your models can be run in your Continuous Integration environment, allowing Continuous Deployment for code and training updates, without increased work for your infrastructure team.
  • Workflow Support whether you prefer the command line, a python console, jupyter notebook, or IDE. Every environment gets readable logging and timing statements configured for both production and development.

15 minute Outline

Basic python knowledge is all that is required to get started. You can spend the rest of the year exploring the intricacies of machine learning, if your machine refuses to learn.

  1. Create a new app (3 min)
  2. Design a model (1 min)
  3. Generate a scaffold (2 min)
  4. Implement a pipeline (5 min)
  5. Test the code (1 min)
  6. Train the model (1 min)
  7. Deploy to production (2 min)
  • mize your algorithm and architecture, while still benefiting from everything else.
  • Pipelines avoid information leaks between train and test sets, and one pipeline allows experimentation with many different estimators. A disk based pipeline is available if you exceed your machines available RAM.
  • Transformers standardize advanced feature engineering. For example, convert an American first name to its statistical age or gender using US Census data. Extract the geographic area code from a free form phone number string. Common date, time and string operations are supported efficiently through pandas.
  • Encoders offer robust input to your estimators, and avoid common problems with missing and long tail values. They are well tested to save you from garbage in/garbage out.
  • IO connections are configured and pooled in a standard way across the app for popular (no)sql databases, with transaction management and read write optimizations for bulk data, rather than typical ORM single row operations. Connections share a configurable query cache, in addition to encrypted S3 buckets for distributing models and datasets.
  • Dependency Management for each individual app in development, that can be 100% replicated to production. No manual activation, or magic env vars, or hidden files that break python for everything else. No knowledge required of venv, pyenv, pyvenv, virtualenv, virtualenvwrapper, pipenv, conda. Ain’t nobody got time for that.
  • Tests for your models can be run in your Continuous Integration environment, allowing Continuous Deployment for code and training updates, without increased work for your infrastructure team.
  • Workflow Support whether you prefer the command line, a python console, jupyter notebook, or IDE. Every environment gets readable logging and timing statements configured for both production and development.

1) Create a new app

Lore manages each project’s dependencies independently, to avoid conflicts with your system python or other projects. Install Lore as a standard pip package:

install.sh

# On Linux
$ pip install lore
# On OS X use homebrew python 2 or 3
$ brew install python3 && pip3 install lore

It’s difficult to repeat someone else’s work when you can’t reproduce their environment. Lore preserves your system python the way your OS likes it to prevent cryptic dependency errors and project conflicts. Each Lore app gets its own directory, with its own python installation and only the dependencies it needs locked to specified versions in runtime.txt and requirements.txt. This makes sharing Lore apps efficient, and brings us one step closer to trivial repeatability for machine learning projects.

With Lore installed, you can create a new app for deep learning projects while you read on. Lore is modular and slim by default, so we’ll need to specify --keras to install deep learning dependencies for this project.

init.sh

$ lore init my_app --python-version=3.6.4 --keras

2) Designing a Model

For the demo we’re going to build a model to predict how popular a product will be on Instacart’s website based solely on its name and the department we put it in. Manufacturers around the world test product names with various focus groups while retailers optimize their placement in stores to maximize appeal. Our simple AI will provide the same service so retailers and manufacturers can better understand merchandising in our new marketplace.

“What’s in a name? That which we call a banana. By any other name would smell as sweet.”
One of the hardest parts of machine learning is acquiring good data. Fortunately, Instacart has published 3 million anonymized grocery orders, which we’ll repurpose for this task. We can then formulate our question into a supervised learning regression model that predicts annual sales based on 2 features: product name and department.

Note that the model we will build is just for illustration purposes — in fact, it kind of sucks. We leave building a good model as an exercise to the curious reader.

3) Generate a scaffold

generate.sh

$ cd my_app
$ lore generate scaffold product_popularity --keras --regression --holdout

Every lore Model consists of a Pipeline to load and encode the data, and an Estimator that implements a particular machine learning algorithm. The interesting part of a model is in the implementation details of the generated classes.

Pipelines start with raw data on the left side, and encode it into the desired form on the right. The estimator is then trained with the encoded data, early stopping on the validation set, and evaluated on the test set. Everything can be serialized to the model store, and loaded again for deployment with a one liner.

4) Implement a Pipeline

It’s rare to be handed raw data that is well suited for a machine learning algorithm. Usually we load it from a database or download a CSV, encode it suitably for the algorithm, and split it into training and test sets. The base classes in lore.pipelines encapsulate this logic in a standard workflow.

lore.pipelines.holdout.Base will split our data into training, validation and test sets, and encode those for our machine learning algorithm. Our subclass will be responsible for defining 3 methods: get_data, get_encoders, and get_output_encoder.

The published data from Instacart is spread across multiple csv files, like database tables.

Our pipeline’s get_data will download the raw Instacart data, and use pandas to join it into a DataFrame with the features (product_name, department) and response (sales) in total units. Like this:

Here is the implementation of get_data:

myapp.pipelines.productpopularity.py

# my_app/pipelines/product_popularity.py part 1

import os

from lore.encoders import Token, Unique, Norm
import lore.io
import lore.pipelines
import lore.env

import pandas


class Holdout(lore.pipelines.holdout.Base):
    # You can inspect the source data csv's yourself from the command line with:
    # $ wget https://s3.amazonaws.com/instacart-datasets/instacart_online_grocery_shopping_2017_05_01.tar.gz
    # $ tar -xzvf instacart_online_grocery_shopping_2017_05_01.tar.gz

    def get_data(self):
        url = 'https://s3.amazonaws.com/instacart-datasets/instacart_online_grocery_shopping_2017_05_01.tar.gz'

        # Lore will extract and cache files in lore.env.data_dir by default
        lore.io.download(url, cache=True, extract=True)
        
        # Defined to DRY up paths to 3rd party file hierarchy
        def read_csv(name):
            path = os.path.join(
                lore.env.data_dir,
                'instacart_2017_05_01',
                name + '.csv')
            return pandas.read_csv(path, encoding='utf8')
        
        # Published order data was split into irrelevant prior/train
        # sets, so we will combine them to re-purpose all the data.
        orders = read_csv('order_products__prior')
        orders = orders.append(read_csv('order_products__train'))

        # count how many times each product_id was ordered
        data = orders.groupby('product_id').size().to_frame('sales')

        # add product names and department ids to ordered product ids
        products = read_csv('products').set_index('product_id')
        data = data.join(products)

        # add department names to the department ids
        departments = read_csv('departments').set_index('department_id')
        data = data.set_index('department_id').join(departments)
        
        # Only return the columns we need for training
        data = data.reset_index()
        return data[['product_name', 'department', 'sales']]

Next, we need to specify an Encoder for each column. A computer scientist might think of encoders as a form of type annotation for effective machine learning. Some products have ridiculously long names, so we’ll truncate those to the first 15 words.

my_app.pipelines.product_popularity.get_encoders.py

# my_app/pipelines/product_popularity.py part 2

    def get_encoders(self):
        return (
            # An encoder to tokenize product names into max 15 tokens that
            # occur in the corpus at least 10 times. We also want the 
            # estimator to spend 5x as many resources on name vs department
            # since there are so many more words in english than there are
            # grocery store departments.
            Token('product_name', sequence_length=15, minimum_occurrences=10, embed_scale=5),
            # An encoder to translate department names into unique
            # identifiers that occur at least 50 times
            Unique('department', minimum_occurrences=50)
        )

    def get_output_encoder(self):
        # Sales is floating point which we could Pass encode directly to the
        # estimator, but Norm will bring it to small values around 0,
        # which are more amenable to deep learning.
        return Norm('sales')

That’s it for the pipeline. Our beginning estimator will be a simple subclass of lore.estimators.keras.Regression which implements a classic deep learning architecture, with reasonable defaults.

# my_app/estimators/product_popularity.py

import lore.estimators.keras


class Keras(lore.estimators.keras.Regression):
    pass

Finally, our model specifies the high level properties of our deep learning architecture, by delegating them back to the estimator, and pulls it’s data from the pipeline we built.

# my_app/models/product_popularity.py

import lore.models.keras

import my_app.pipelines.product_popularity
import my_app.estimators.product_popularity


class Keras(lore.models.keras.Base):
    def __init__(self, pipeline=None, estimator=None):
        super(Keras, self).__init__(
            my_app.pipelines.product_popularity.Holdout(),
            my_app.estimators.product_popularity.Keras(
                hidden_layers=2,
                embed_size=4,
                hidden_width=256,
                batch_size=1024,
                sequence_embedding='lstm',
            )
        )

5) Test the code

A smoke test was created automatically for this model when you generated the scaffolding. The first run will take some time to download the 200MB data set for testing. A good practice would be to trim down the files cached in ./tests/data and check them into your repo to remove a network dependency and speed up test runs.

test.sh

$ lore test tests.unit.test_product_popularity

6) Train the model

Training a model will cache data in ./data and save artifacts in ./models

fit.sh

$ lore fit my_app.models.product_popularity.Keras --test --score

Follow the logs in a second terminal to see how Lore is spending its time.

logs.sh

$ tail -f logs/development.log

Try adding more hidden layers to see if that helps with your model’s score. You can edit the model file, or pass any property directly via the command line call to fit, e.g. --hidden_layers=5. It should take about 30 seconds with a cached data set.

Inspect the model’s features

You can run jupyter notebooks in your lore env. Lore will install a custom jupyter kernel that will reference your app’s virtual env for both lore notebook and lore console.

$ lore notebook

Browse to notebooks/product_popularity/features.ipynb and “run all” to see some visualizations for the last fitting of your model.

You can see how well the model’s predictions (blue) track the test set (gold) when aggregated for a particular feature. In this case there are 21 departments with fairly good overlap, except for “produce” where the model does not fully account for how much of an outlier it is.

You can get also see the deep learning architecture that was generated by running the notebook in notebooks/product_popularity/architecture.ipynb

15 tokenized parts of the name run through an LSTM on the left side, and the department name is fed into an embedding on the right side, then both go through the hidden layers.

Serve your model

Lore apps can be run locally as an HTTP API to models. By default, models will expose their “predict” method via an HTTP GET endpoint.

*predict.sh *

$ lore server &
$ curl "http://localhost:5000/product_popularity.Keras/predict.json?product_name=Banana&department=produce"
$ curl "http://localhost:5000/product_popularity.Keras/predict.json?product_name=Organic%20Banana&department=produce"
$ curl "http://localhost:5000/product_popularity.Keras/predict.json?product_name=Green%20Banana&department=produce"
$ curl "http://localhost:5000/product_popularity.Keras/predict.json?product_name=Brown%20Banana&department=produce"

My results indicate adding “Organic” to “Banana” will sell more than twice as much fruit in our “produce” department. “Green Banana” is predicted to sell worse than “Brown Banana”. If you really want to juice sales — wait for it — “Organic Yellow Banana”. Who knew?

7) Deploy to production

Lore apps are deployable via any infrastructure that supports Heroku buildpacks. Buildpacks install the specifications in runtime.txtand requirements.txt in a container for deploys. If you want horizontal scalability in the cloud, you can follow the getting started guide on heroku.

You can see the results of each time you issued the lore fit command in ./models/my_app.models.product_popularity/Keras/. This directory and ./data/ are in .gitignore by default because your code can always recreate them. A simple strategy for deployment, is to check in the model version you want to publish.

git.sh

$ git init .
$ git add .
$ git add -f models/my_app.models.product_popularity/Keras/1 # or your preferred fitting number to deploy
$ git commit -m "My first lore app!"

Heroku makes it easy to publish an app. Checkout their getting started guide.

Here’s the TLDR:

publish.sh

$ heroku login
$ heroku create
$ heroku config:set LORE_PROJECT=my_app
$ heroku config:set LORE_ENV=production
$ git push heroku master
$ heroku open

$ curl “`heroku info -s | grep web_url | cut -d= -f2`product_popularity.Keras/predict.json?product_name=Banana&department=produce”


Now you can replace http://localhost:5000/ with your heroku app name and access your predictions from anywhere!

  • mize your algorithm and architecture, while still benefiting from everything else.
  • Pipelines avoid information leaks between train and test sets, and one pipeline allows experimentation with many different estimators. A disk based pipeline is available if you exceed your machines available RAM.
  • Transformers standardize advanced feature engineering. For example, convert an American first name to its statistical age or gender using US Census data. Extract the geographic area code from a free form phone number string. Common date, time and string operations are supported efficiently through pandas.
  • Encoders offer robust input to your estimators, and avoid common problems with missing and long tail values. They are well tested to save you from garbage in/garbage out.
  • IO connections are configured and pooled in a standard way across the app for popular (no)sql databases, with transaction management and read write optimizations for bulk data, rather than typical ORM single row operations. Connections share a configurable query cache, in addition to encrypted S3 buckets for distributing models and datasets.
  • Dependency Management for each individual app in development, that can be 100% replicated to production. No manual activation, or magic env vars, or hidden files that break python for everything else. No knowledge required of venv, pyenv, pyvenv, virtualenv, virtualenvwrapper, pipenv, conda. Ain’t nobody got time for that.
  • Tests for your models can be run in your Continuous Integration environment, allowing Continuous Deployment for code and training updates, without increased work for your infrastructure team.
  • Workflow Support whether you prefer the command line, a python console, jupyter notebook, or IDE. Every environment gets readable logging and timing statements configured for both production and development.

Next Steps

We consider the 0.5 release a strong foundation to build up with the community to 1.0. Patch releases will avoid breaking changes, but minor versions may change functionality depending on community needs. We’ll deprecate and issue warnings to maintain a clear upgrade path for existing apps.

Here are some features we want to add before 1.0:

  • mize your algorithm and architecture, while still benefiting from everything else.
  • Pipelines avoid information leaks between train and test sets, and one pipeline allows experimentation with many different estimators. A disk based pipeline is available if you exceed your machines available RAM.
  • Transformers standardize advanced feature engineering. For example, convert an American first name to its statistical age or gender using US Census data. Extract the geographic area code from a free form phone number string. Common date, time and string operations are supported efficiently through pandas.
  • Encoders offer robust input to your estimators, and avoid common problems with missing and long tail values. They are well tested to save you from garbage in/garbage out.
  • IO connections are configured and pooled in a standard way across the app for popular (no)sql databases, with transaction management and read write optimizations for bulk data, rather than typical ORM single row operations. Connections share a configurable query cache, in addition to encrypted S3 buckets for distributing models and datasets.
  • Dependency Management for each individual app in development, that can be 100% replicated to production. No manual activation, or magic env vars, or hidden files that break python for everything else. No knowledge required of venv, pyenv, pyvenv, virtualenv, virtualenvwrapper, pipenv, conda. Ain’t nobody got time for that.
  • Tests for your models can be run in your Continuous Integration environment, allowing Continuous Deployment for code and training updates, without increased work for your infrastructure team.
  • Workflow Support whether you prefer the command line, a python console, jupyter notebook, or IDE. Every environment gets readable logging and timing statements configured for both production and development.

What would make Lore more useful for your machine learning work?

Recommended Courses:

machine learning and neural networks mini case studies

http://bit.ly/2wiUqlU

Zero to Deep Learning™ with Python and Keras

http://go.codetrick.net/ryYNsUpy7

Python for Machine Learning and Data Mining

http://bit.ly/2HVFcsr

Regression Analysis for Statistics & Machine Learning in R

http://bit.ly/2GCqO3F

The best machine learning and deep learning libraries

The best machine learning and deep learning libraries

You are asking Why TensorFlow, Spark MLlib, Scikit-learn, PyTorch, MXNet, and Keras shine for building and training machine learning and deep learning models.If you’re starting a new machine learning or deep learning project, you may be confused about which framework to choose...

You are asking Why TensorFlow, Spark MLlib, Scikit-learn, PyTorch, MXNet, and Keras shine for building and training machine learning and deep learning models.If you’re starting a new machine learning or deep learning project, you may be confused about which framework to choose...

There is a difference between a machine learning framework and a deep learning framework. Essentially, a machine learning framework covers a variety of learning methods for classification, regression, clustering, anomaly detection, and data preparation, and may or may not include neural network methods.

A deep learning or deep neural network framework covers a variety of neural network topologies with many hidden layers. Keras, MXNet, PyTorch, and TensorFlow are deep learning frameworks. Scikit-learn and Spark MLlib are machine learning frameworks. (Click any of the previous links to read my stand-alone review of the product.)

In general, deep neural network computations run much faster on a GPU (specifically an Nvidia CUDA general-purpose GPU), TPU, or FPGA, rather than on a CPU. In general, simpler machine learning methods don’t benefit from a GPU.

While you can train deep neural networks on one or more CPUs, the training tends to be slow, and by slow I’m not talking about seconds or minutes. The more neurons and layers that need to be trained, and the more data available for training, the longer it takes. When the Google Brain team trained its language translation models for the new version of Google Translate in 2016, they ran their training sessions for a week at a time, on multiple GPUs. Without GPUs, each model training experiment would have taken months.

Since then, the Intel Math Kernel Library (MKL) has made it possible to train some neural networks on CPUs in a reasonable amount of time. Meanwhile GPUs, TPUs, and FPGAs have gotten even faster.

The training speed of all of the deep learning packages running on the same GPUs is nearly identical. That’s because the training inner loops spend most of their time in the Nvidia CuDNN package.

Apart from training speed, each of the deep learning libraries has its own set of pros and cons, and the same is true of Scikit-learn and Spark MLlib. Let’s dive in.

Keras

Keras is a high-level, front-end specification and implementation for building neural network models that ships with support for three back-end deep learning frameworks: TensorFlow, CNTK, and Theano. Amazon is currently working on developing a MXNet back-end for Keras. It’s also possible to use PlaidML (an independent project) as a back-end for Keras to take advantage of PlaidML’s OpenCL support for all GPUs.

TensorFlow is the default back-end for Keras, and the one recommended for many use cases involving GPU acceleration on Nvidia hardware via CUDA and cuDNN, as well as for TPU acceleration in Google Cloud. TensorFlow also contains an internal tf.keras class, separate from an external Keras installation.

Keras has a high-level environment that makes adding a layer to a neural network as easy as one line of code in its Sequential model, and requires only one function call each for compiling and training a model. Keras lets you work at a lower level if you want, with its Model or functional API.

Keras allows you to drop down even farther, to the Python coding level, by subclassing keras.Model, but prefers the functional API when possible. Keras also has a scikit-learn API, so that you can use the Scikit-learn grid search to perform hyperparameter optimization in Keras models.

Cost: Free open source.

Platform: Linux, MacOS, Windows, or Raspbian; TensorFlow, Theano, or CNTK back-end.

MXNet

MXNet has evolved and improved quite a bit since moving under the Apache Software Foundation umbrella early in 2017. While there has been work on Keras with an MXNet back-end, a different high-level interface has become much more important: Gluon. Prior to the incorporation of Gluon, you could either write easy imperative code or fast symbolic code in MXNet, but not both at once. With Gluon, you can combine the best of both worlds, in a way that competes with both Keras and PyTorch.

The advantages claimed for Gluon include:

  • Simple, easy-to-understand code: Gluon offers a full set of plug-and-play neural network building blocks, including predefined layers, optimizers, and initializers.
  • Flexible, imperative structure: Gluon does not require the neural network model to be rigidly defined, but rather brings the training algorithm and model closer together to provide flexibility in the development process.
  • Dynamic graphs: Gluon enables developers to define neural network models that are dynamic, meaning they can be built on the fly, with any structure, and using any of Python’s native control flow.
  • High performance: Gluon provides all of the above benefits without impacting the training speed that the underlying engine provides.

These four advantages, along with a vastly expanded collection of model examples, bring Gluon/MXNet to rough parity with Keras/TensorFlow and PyTorch for ease of development and training speed. You can see code examples for each these on the main Gluon page and repeated on the overview page for the Gluon API.

The Gluon API includes functionality for neural network layers, recurrent neural networks, loss functions, dataset methods and vision datasets, a model zoo, and a set of contributed experimental neural network methods. You can freely combine Gluon with standard MXNet and NumPy modules, for example module**, **autograd, and ndarray, as well as with Python control flows.

Gluon has a good selection of layers for building models, including basic layers (Dense, Dropout, etc.), convolutional layers, pooling layers, and activation layers. Each of these is a one-line call. These can be used, among other places, inside of network containers such as gluon.nn.Sequential().

Cost: Free open source.

Platform: Linux, MacOS, Windows, Docker, Raspbian, and Nvidia Jetson; Python, R, Scala, Julia, Perl, C++, and Clojure (experimental). MXNet is included in the AWS Deep Learning AMI.

PyTorch

PyTorch builds on the old Torch and the new Caffe2 framework. As you might guess from the name, PyTorch uses Python as its scripting language, and it uses an evolved Torch C/CUDA back-end. The production features of Caffe2 are being incorporated into the PyTorch project.

PyTorch is billed as “Tensors and dynamic neural networks in Python with strong GPU acceleration.” What does that mean?

Tensors are a mathematical construct that is used heavily in physics and engineering. A tensor of rank two is a special kind of matrix; taking the inner product of a vector with the tensor yields another vector with a new magnitude and a new direction. TensorFlow takes its name from the way tensors (of synapse weights) flow around its network model. NumPy also uses tensors, but calls them an ndarray.

GPU acceleration is a given for most modern deep neural network frameworks. A dynamic neural network is one that can change from iteration to iteration, for example allowing a PyTorch model to add and remove hidden layers during training to improve its accuracy and generality. PyTorch recreates the graph on the fly at each iteration step. In contrast, TensorFlow by default creates a single dataflow graph, optimizes the graph code for performance, and then trains the model.

While eager execution mode is a fairly new option in TensorFlow, it’s the only way PyTorch runs: API calls execute when invoked, rather than being added to a graph to be run later. That might seem like it would be less computationally efficient, but PyTorch was designed to work that way, and it is no slouch when it comes to training or prediction speed.

PyTorch integrates acceleration libraries such as Intel MKL and Nvidia cuDNN and NCCL (Nvidia Collective Communications Library) to maximize speed. Its core CPU and GPU Tensor and neural network back-ends—TH (Torch), THC (Torch CUDA), THNN (Torch Neural Network), and THCUNN (Torch CUDA Neural Network)—are written as independent libraries with a C99 API. At the same time, PyTorch is not a Python binding into a monolithic C++ framework—the intention is for it to be deeply integrated with Python and to allow the use of other Python libraries.

Cost: Free open source.

Platform: Linux, MacOS, Windows; CPUs and Nvidia GPUs.

Scikit-learn

The Scikit-learn Python framework has a wide selection of robust machine learning algorithms, but no deep learning. If you’re a Python fan, Scikit-learn may well be the best option for you among the plain machine learning libraries.

Scikit-learn is a robust and well-proven machine learning library for Python with a wide assortment of well-established algorithms and integrated graphics. It is relatively easy to install, learn, and use, and it has good examples and tutorials.

On the con side, Scikit-learn does not cover deep learning or reinforcement learning, lacks graphical models and sequence prediction, and it can’t really be used from languages other than Python. It doesn’t support PyPy, the Python just-in-time compiler, or GPUs. That said, except for its minor foray into neural networks, it doesn’t really have speed problems. It uses Cython (the Python to C compiler) for functions that need to be fast, such as inner loops.

Scikit-learn has a good selection of algorithms for classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It has good documentation and examples for all of these, but lacks any kind of guided workflow for accomplishing these tasks.

Scikit-learn earns top marks for ease of development, mostly because the algorithms all work as documented, the APIs are consistent and well-designed, and there are few “impedance mismatches” between data structures. It’s a pleasure to work with a library whose features have been thoroughly fleshed out and whose bugs have been thoroughly flushed out.

On the other hand, the library does not cover deep learning or reinforcement learning, which leaves out the current hard but important problems, such as accurate image classification and reliable real-time language parsing and translation. Clearly, if you’re interested in deep learning, you should look elsewhere.

Nevertheless, there are many problems—ranging from building a prediction function linking different observations, to classifying observations, to learning the structure of an unlabeled dataset—that lend themselves to plain old machine learning without needing dozens of layers of neurons, and for those areas Scikit-learn is very good indeed.

Cost: Free open source.

Platform: Requires Python, NumPy, SciPy, and Matplotlib. Releases are available for MacOS, Linux, and Windows.

Spark MLlib

Spark MLlib, the open source machine learning library for Apache Spark, provides common machine learning algorithms such as classification, regression, clustering, and collaborative filtering (but not deep neural networks). It also includes tools for feature extraction, transformation, dimensionality reduction, and selection; tools for constructing, evaluating, and tuning machine learning pipelines; and utilities for saving and loading algorithms, models, and pipelines, for data handling, and for doing linear algebra and statistics.

Spark MLlib is written in Scala, and uses the linear algebra package Breeze. Breeze depends on netlib-java for optimized numerical processing, although in the open source distribution that means optimized use of the CPU. Databricks offers customized Spark clusters that use GPUs, which can potentially get you another 10x speed improvement for training complex machine learning models with big data.

Spark MLlib implements a truckload of common algorithms and models for classification and regression, to the point where a novice could become confused, but an expert would be likely to find a good choice of model for the data to be analyzed, eventually. To this plethora of models Spark 2.x adds the important feature of hyperparameter tuning, also known as model selection. Hyperparameter tuning allows the analyst to set up a parameter grid, an estimator, and an evaluator, and let the cross-validation method (time-consuming but accurate) or train validation split method (faster but less accurate) find the best model for the data.

Spark MLlib has full APIs for Scala and Java, mostly-full APIs for Python, and sketchy partial APIs for R. You can get a good feel for the coverage by counting the samples: 54 Java and 60 Scala machine learning examples, 52 Python machine learning examples, and only five R examples. In my experience Spark MLlib is easiest to work with using Jupyter notebooks, but you can certainly run it in a console if you tame the verbose Spark status messages.

Spark MLlib supplies pretty much anything you’d want in the way of basic machine learning, feature selection, pipelines, and persistence. It does a pretty good job with classification, regression, clustering, and filtering. Given that it is part of Spark, it has great access to databases, streams, and other data sources. On the other hand, Spark MLlib is not really set up to model and train deep neural networks in the same way as TensorFlow, PyTorch, MXNet, and Keras.

Cost: Free open source.

Platform: Spark runs on both Windows and Unix-like systems (e.g. Linux, MacOS), with Java 7 or later, Python 2.6/3.4 or later, and R 3.1 or later. For the Scala API, Spark 2.0.1 uses Scala 2.11. Spark requires Hadoop/HDFS.

TensorFlow

TensorFlow is probably the gold standard for deep neural network development, although it is not without its defects. Two of the biggest issues with TensorFlow historically were that it was too hard to learn and that it took too much code to create a model. Both issues have been addressed over the last few years.

To make TensorFlow easier to learn, the TensorFlow team has produced more learning materials as well as clarifying the existing “getting started” tutorials. A number of third parties have produced their own tutorial materials (including InfoWorld). There are now multiple TensorFlow books in print, and several online TensorFlow courses. You can even follow the CS20 course at Stanford, TensorFlow for Deep Learning Research, which posts all the slides and lecture notes online.

There are several new sections of the TensorFlow library that offer interfaces that require less programming to create and train models. These include tf.keras, which provides a TensorFlow-only version of the otherwise engine-neutral Keras package, and tf.estimator, which provides a number of high-level facilities for working with models. These include both regressors and classifiers for linear, deep neural networks, and combined linear and deep neural networks, plus a base class from which you can build your own estimators. In addition, the Dataset APIenables you to build complex input pipelines from simple, reusable pieces. You don’t have to choose just one. As this tutorial shows, you can usefully make tf.keras, tf.data.dataset, and tf.estimator work together.

TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices, which enables on-device machine learning inference (but not training) with low latency and a small binary size. TensorFlow Lite also supports hardware acceleration with the Android Neural Networks API. TensorFlow Lite models are small enough to run on mobile devices, and can serve the offline use case.

The basic idea of TensorFlow Lite is that you train a full-blown TensorFlow model and convert it to the TensorFlow Lite model format. Then you can use the converted file in your mobile application on Android or iOS.

Alternatively, you can use one of the pre-trained TensorFlow Lite models for image classification or smart replies. Smart replies are contextually relevant messages that can be offered as response options; this essentially provides the same reply prediction functionality as found in Google’s Gmail clients.

Yet another option is to retrain an existing TensorFlow model against a new tagged dataset, an important technique called transfer learning, which reduces training times significantly. A hands-on tutorial on this process is called TensorFlow for Poets.

Cost: Free open source.

Platform: Ubuntu 14.04 or later, MacOS 10.11 or later, Windows 7 or later; Nvidia GPU and CUDA recommended. Most clouds now support TensorFlow with Nvidia GPUs. TensorFlow Lite runs trained models on Android and iOS.

Machine learning or deep learning?

Sometimes you know that you’ll need a deep neural network to solve a particular problem effectively, for example to classify images, recognize speech, or translate languages. Other times, you don’t know whether that’s necessary, for example to predict next month’s sales figures or to detect outliers in your data.

If you do need a deep neural network, then Keras, MXNet with Gluon, PyTorch, and TensorFlow with Keras or Estimators are all good choices. If you aren’t sure, then start with Scikit-learn or Spark MLlib and try all the relevant algorithms. If you get satisfactory results from the best model or an ensemble of several models, you can stop.

If you need better results, then try to perform transfer learning on a trained deep neural network. If you still don’t get what you need, then try building and training a deep neural network from scratch. To refine your model, try hyperparameter tuning.

No matter what method you use to train a model, remember that the model is only as good as the data you use for training. Remember to clean it, to standardize it, and to balance the sizes of your training classes.

Deep Learning vs. Conventional Machine Learning

Deep Learning vs. Conventional Machine Learning

Over the past few years, deep learning has given rise to a massive collection of ideas and techniques which are disruptive to conventional machine learning practices. However, are those ideas totally different from the traditional methods? Where are the connections and differences? What are the advantages and disadvantages? How practical are the deep learning methods for business applications? Chao will share her thoughts on those questions based on her readings and hands on experiments in the areas of text analytics (question answering system, sentiment analysis) and healthcare image classification.

Over the past few years, deep learning has given rise to a massive collection of ideas and techniques which are disruptive to conventional machine learning practices. However, are those ideas totally different from the traditional methods? Where are the connections and differences? What are the advantages and disadvantages? How practical are the deep learning methods for business applications? Chao will share her thoughts on those questions based on her readings and hands on experiments in the areas of text analytics (question answering system, sentiment analysis) and healthcare image classification.

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading about Deep Learning and Machine Learning

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning for Front-End Developers

The Future of Machine Learning and JavaScript

Deep Learning With TensorFlow 2.0

How to get started with Python for Deep Learning and Data Science

Machine Learning vs Deep Learning: What's the Difference?

Machine Learning vs Deep Learning: What's the Difference?

In this post talks about the differences and relationship between Artificial intelligence (AI), Machine Learning (ML) and Deep Learning (DL)

In this post talks about the differences and relationship between Artificial intelligence (AI), Machine Learning (ML) and Deep Learning (DL)

**In this tutorial, you will learn: **

  • What is Artificial intelligence (AI)?
  • What is Machine Learning (ML)?
  • What is Deep Learning (DL)?
  • Machine Learning Process
  • Deep Learning Process
  • Automate Feature Extraction using DL
  • Difference between Machine Learning and Deep Learning
  • When to use ML or DL?
What is AI?

Artificial intelligence is imparting a cognitive ability to a machine. The benchmark for **AI **is the human intelligence regarding reasoning, speech, and vision. This benchmark is far off in the future.

AI has three different levels:

  1. Narrow AI: A Artificial intelligence is said to be narrow when the machine can perform a specific task better than a human. The current research of AI is here now
  2. General AI: An artificial intelligence reaches the general state when it can perform any intellectual task with the same accuracy level as a human would
  3. Active AI: An AI is active when it can beat humans in many tasks

Early AI systems used pattern matching and expert systems.

What is ML?

Machine learning is the best tool so far to analyze, understand and identify a pattern in the data. One of the main ideas behind Machine Learning is that the computer can be trained to automate tasks that would be exhaustive or impossible for a human being. The clear breach from the traditional analysis is that machine learning can take decisions with minimal human intervention.

Machine learning uses data to feed an algorithm that can understand the relationship between the input and the output. When the machine finished learning, it can predict the value or the class of new data point.

What is Deep Learning?

Deep learning is a computer software that mimics the network of neurons in a brain. It is a subset of machine learning and is called deep learning because it makes use of deep neural networks. The machine uses *different *layers to *learn *from the data. The depth of the model is represented by the number of layers in the model. Deep learning is the new state of the art in term of AI. In deep learning, the learning phase is done through a neural network. A neural network is an architecture where the layers are stacked on top of each other

Machine Learning Process

Imagine you are meant to build a program that recognizes objects. To train the model, you will use a classifier. A classifier uses the features of an object to try identifying the class it belongs to.

In the example, the classifier will be trained to detect if the image is a:

  • What is Artificial intelligence (AI)?
  • What is Machine Learning (ML)?
  • What is Deep Learning (DL)?
  • Machine Learning Process
  • Deep Learning Process
  • Automate Feature Extraction using DL
  • Difference between Machine Learning and Deep Learning
  • When to use ML or DL?

The four objects above are the class the classifier has to recognize. To construct a classifier, you need to have some data as input and assigns a label to it. The algorithm will take these data, find a pattern and then classify it in the corresponding class.

This task is called supervised learning. In supervised learning, the training data you feed to the algorithm includes a label.

Training an algorithm requires to follow a few standard steps:

  • What is Artificial intelligence (AI)?
  • What is Machine Learning (ML)?
  • What is Deep Learning (DL)?
  • Machine Learning Process
  • Deep Learning Process
  • Automate Feature Extraction using DL
  • Difference between Machine Learning and Deep Learning
  • When to use ML or DL?

The first step is necessary, choosing the right data will make the algorithm success or a failure. The data you choose to train the model is called a feature. In the object example, the features are the pixels of the images.

Each image is a row in the data while each pixel is a column. If your image is a 28x28 size, the dataset contains 784 columns (28x28). In the picture below, each picture has been transformed into a feature vector. The label tells the computer what object is in the image.

The objective is to use these training data to classify the type of object. The first step consists of creating the feature columns. Then, the second step involves choosing an algorithm to train the model. When the training is done, the model will predict what picture corresponds to what object.

After that, it is easy to use the model to predict new images. For each new image feeds into the model, the machine will predict the class it belongs to. For example, an entirely new image without a label is going through the model. For a human being, it is trivial to visualize the image as a car. The machine uses its previous knowledge to predict as well the image is a car.

Deep Learning Process

In deep learning, the *learning *phase is done through a neural network. A neural network is an architecture where the layers are stacked on top of each other.

Consider the same image example above. The training set would be fed to a neural network

Each input goes into a neuron and is multiplied by a weight. The result of the multiplication flows to the next layer and become the input. This process is repeated for each layer of the network. The final layer is named the output layer; it provides an actual value for the regression task and a probability of each class for the classification task. The neural network uses a mathematical algorithm to update the weights of all the neurons. The neural network is fully trained when the value of the weights gives an output close to the reality. For instance, a well-trained neural network can recognize the object on a picture with higher accuracy than the traditional neural net.

Automate Feature Extraction using DL

A dataset can contain a dozen to hundreds of features. The system will learn from the relevance of these features. However, not all features are meaningful for the algorithm. A crucial part of machine learning is to find a relevant set of features to make the system learns something.

One way to perform this part in machine learning is to use feature extraction. Feature extraction combines existing features to create a more relevant set of features. It can be done with PCA, T-SNE or any other dimensionality reduction algorithms.

For example, an image processing, the practitioner needs to extract the feature manually in the image like the eyes, the nose, lips and so on. Those extracted features are feed to the classification model.

Deep learning solves this issue, especially for a convolutional neural network. The first layer of a neural network will learn small details from the picture; the next layers will combine the previous knowledge to make more complex information. In the convolutional neural network, the feature extraction is done with the use of the filter. The network applies a filter to the picture to see if there is a match, i.e., the shape of the feature is identical to a part of the image. If there is a match, the network will use this filter. The process of feature extraction is therefore done automatically.

Difference between Machine Learning and Deep Learning

When to use ML or DL?

In the table below, we summarize ***the difference between machine learning and deep learning. ***

With machine learning, you need fewer data to train the algorithm than deep learning. Deep learning requires an extensive and diverse set of data to identify the underlying structure. Besides, machine learning provides a faster-trained model. Most advanced deep learning architecture can take days to a week to train. The advantage of deep learning over machine learning is it is highly accurate. You do not need to understand what features are the best representation of the data; the neural network learned how to select critical features. In machine learning, you need to choose for yourself what features to include in the model.

Summary

Artificial intelligence is imparting a cognitive ability to a machine. Early AI systems used pattern matching and expert systems.

The idea behind machine learning is that the machine can learn without human intervention. The machine needs to find a way to learn how to solve a task given the data.

Deep learning is the breakthrough in the field of artificial intelligence. When there is enough data to train on, **deep learning **achieves impressive results, especially for image recognition and text translation. The main reason is the feature extraction is done automatically in the different layers of the network.