Royce  Reinger

Royce Reinger


High-level Library to Help with Training and Evaluating Neural Network


Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

PyTorch-Ignite teaser 

Click on the image to see complete code


Less code than pure PyTorch while ensuring maximum control and simplicity

Library approach and no program's control inversion - Use ignite where and when you need

Extensible API for metrics, experiment managers, and other components

Table of Contents

Why Ignite?

Ignite is a library that provides three high-level features:

  • Extremely simple engine and event system
  • Out-of-the-box metrics to easily evaluate models
  • Built-in handlers to compose training pipeline, save artifacts and log parameters and metrics

Simplified training and validation loop

No more coding for/while loops on epochs and iterations. Users instantiate engines and run them.


from ignite.engine import Engine, Events, create_supervised_evaluator
from ignite.metrics import Accuracy

# Setup training engine:
def train_step(engine, batch):
    # Users can do whatever they need on a single iteration
    # Eg. forward/backward pass for any number of models, optimizers, etc
    # ...

trainer = Engine(train_step)

# Setup single model evaluation engine
evaluator = create_supervised_evaluator(model, metrics={"accuracy": Accuracy()})

def validation():
    state =
    # print computed metrics
    print(trainer.state.epoch, state.metrics)

# Run model's validation at the end of each epoch
trainer.add_event_handler(Events.EPOCH_COMPLETED, validation)

# Start the training, max_epochs=100)

Power of Events & Handlers

The cool thing with handlers is that they offer unparalleled flexibility (compared to, for example, callbacks). Handlers can be any function: e.g. lambda, simple function, class method, etc. Thus, we do not require to inherit from an interface and override its abstract methods which could unnecessarily bulk up your code and its complexity.

Execute any number of functions whenever you wish


trainer.add_event_handler(Events.STARTED, lambda _: print("Start training"))

# attach handler with args, kwargs
mydata = [1, 2, 3, 4]
logger = ...

def on_training_ended(data):
    print(f"Training is ended. mydata={data}")
    # User can use variables from another scope"Training is ended")

trainer.add_event_handler(Events.COMPLETED, on_training_ended, mydata)
# call any number of functions on a single event
trainer.add_event_handler(Events.COMPLETED, lambda engine: print(engine.state.times))

def log_something(engine):

Built-in events filtering


# run the validation every 5 epochs
def run_validation():
    # run validation

# change some training variable once on 20th epoch
def change_training_variable():
    # ...

# Trigger handler with customly defined frequency
def log_gradients():
    # ...

Stack events to share some actions


Events can be stacked together to enable multiple calls:

@trainer.on(Events.COMPLETED | Events.EPOCH_COMPLETED(every=10))
def run_validation():
    # ...

Custom events to go beyond standard events


Custom events related to backward and optimizer step calls:

from ignite.engine import EventEnum

class BackpropEvents(EventEnum):
    BACKWARD_STARTED = 'backward_started'
    BACKWARD_COMPLETED = 'backward_completed'
    OPTIM_STEP_COMPLETED = 'optim_step_completed'

def update(engine, batch):
    # ...
    loss = criterion(y_pred, y)
    # ...

trainer = Engine(update)

def function_before_backprop(engine):
    # ...

Out-of-the-box metrics

Metrics for various tasks: Precision, Recall, Accuracy, Confusion Matrix, IoU etc, ~20 regression metrics.

Users can also compose their metrics with ease from existing ones using arithmetic operations or torch methods.


precision = Precision(average=False)
recall = Recall(average=False)
F1_per_class = (precision * recall * 2 / (precision + recall))
F1_mean = F1_per_class.mean()  # torch mean method
F1_mean.attach(engine, "F1")


From pip:

pip install pytorch-ignite

From conda:

conda install ignite -c pytorch

From source:

pip install git+

Nightly releases

From pip:

pip install --pre pytorch-ignite

From conda (this suggests to install pytorch nightly release instead of stable version as dependency):

conda install ignite -c pytorch-nightly

Docker Images

Using pre-built images

Pull a pre-built docker image from our Docker Hub and run it with docker v19.03+.

docker run --gpus all -it -v $PWD:/workspace/project --network=host --shm-size 16G pytorchignite/base:latest /bin/bash

List of available pre-built images


  • pytorchignite/base:latest
  • pytorchignite/apex:latest
  • pytorchignite/hvd-base:latest
  • pytorchignite/hvd-apex:latest
  • pytorchignite/msdp-apex:latest


  • pytorchignite/vision:latest
  • pytorchignite/hvd-vision:latest
  • pytorchignite/apex-vision:latest
  • pytorchignite/hvd-apex-vision:latest
  • pytorchignite/msdp-apex-vision:latest


  • pytorchignite/nlp:latest
  • pytorchignite/hvd-nlp:latest
  • pytorchignite/apex-nlp:latest
  • pytorchignite/hvd-apex-nlp:latest
  • pytorchignite/msdp-apex-nlp:latest

For more details, see here.

Getting Started

Few pointers to get you started:


Additional Materials



Reproducible Training Examples

Inspired by torchvision/references, we provide several reproducible baselines for vision tasks:

  • ImageNet - logs on Ignite Trains server coming soon ...
  • Pascal VOC2012 - logs on Ignite Trains server coming soon ...


Code-Generator application

The easiest way to create your training scripts with PyTorch-Ignite:


GitHub issues: questions, bug reports, feature requests, etc.

Discuss.PyTorch, category "Ignite".

PyTorch-Ignite Discord Server: to chat with the community

GitHub Discussions: general library-related discussions, ideas, Q&A, etc.

User feedback

We have created a form for "user feedback". We appreciate any type of feedback, and this is how we would like to see our community:

  • If you like the project and want to say thanks, this the right place.
  • If you do not like something, please, share it with us, and we can see how to improve it.

Thank you!


Please see the contribution guidelines for more information.

As always, PRs are welcome :)

Projects using Ignite

Research papersBlog articles, tutorials, booksToolkitsOthers

See other projects at "Used by"

If your project implements a paper, represents other use-cases not covered in our official tutorials, Kaggle competition's code, or just your code presents interesting results and uses Ignite. We would like to add your project to this list, so please send a PR with brief description of the project.

Citing Ignite

If you use PyTorch-Ignite in a scientific publication, we would appreciate citations to our project.

  author = {V. Fomin and J. Anmol and S. Desroziers and J. Kriss and A. Tejani},
  title = {High-level library to help with training neural networks in PyTorch},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{}},

About the team & Disclaimer

PyTorch-Ignite is a NumFOCUS Affiliated Project, operated and maintained by volunteers in the PyTorch community in their capacities as individuals (and not as representatives of their employers). See the "About us" page for a list of core contributors. For usage questions and issues, please see the various channels here. For all other questions and inquiries, please send an email to

Download Details: 
Author: Pytorch
Source Code: 
License: BSD-3-Clause License

#python  #machine-learning #deep-learning 

What is GEEK

Buddha Community

High-level Library to Help with Training and Evaluating Neural Network
Mckenzie  Osiki

Mckenzie Osiki


No Code introduction to Neural Networks

The simple architecture explained

Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.

In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.

Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:

#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks

Marlon  Boyle

Marlon Boyle


Autonomous Driving Network (ADN) On Its Way

Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.

Image for post

Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.

The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.

Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.

In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.

The Vision

At the current stage, there are different terminologies to describe ADN vision by various organizations.
Image for post

Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.

On the network layer, it contains the below critical aspects:

  • Intent-Driven: Understand the operator’s business intent and automatically translate it into necessary network operations. The operation can be a one-time operation like disconnect a connection service or continuous operations like maintaining a specified SLA (Service Level Agreement) at the all-time.
  • **Self-Discover: **Automatically discover hardware/software changes in the network and populate the changes to the necessary subsystems to maintain always-sync state.
  • **Self-Config/Self-Organize: **Whenever network changes happen, automatically configure corresponding hardware/software parameters such that the network is at the pre-defined target states.
  • **Self-Monitor: **Constantly monitor networks/services operation states and health conditions automatically.
  • Auto-Detect: Detect network faults, abnormalities, and intrusions automatically.
  • **Self-Diagnose: **Automatically conduct an inference process to figure out the root causes of issues.
  • **Self-Healing: **Automatically take necessary actions to address issues and bring the networks/services back to the desired state.
  • **Self-Report: **Automatically communicate with its environment and exchange necessary information.
  • Automated common operational scenarios: Automatically perform operations like network planning, customer and service onboarding, network change management.

On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).

No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.

David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):

“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”

Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)

“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”

Is This Vision Achievable?

If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:

  • software-defined networking (SDN) control
  • industry-standard models and open APIs
  • Real-time analytics/telemetry
  • big data processing
  • cross-domain orchestration
  • programmable infrastructure
  • cloud-native virtualized network functions (VNF)
  • DevOps agile development process
  • everything-as-service design paradigm
  • intelligent process automation
  • edge computing
  • cloud infrastructure
  • programing paradigm suitable for building an autonomous system . i.e., teleo-reactive programs, which is a set of reactive rules that continuously sense the environment and trigger actions whose continuous execution eventually leads the system to satisfy a goal. (Nils Nilsson, 1996)
  • open-source solutions

#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks

Sofia  Maggio

Sofia Maggio


Neural networks forward propagation deep dive 102

Forward propagation is an important part of neural networks. Its not as hard as it sounds ;-)

This is part 2 in my series on neural networks. You are welcome to start at part 1 or skip to part 5 if you just want the code.

So, to perform gradient descent or cost optimisation, we need to write a cost function which performs:

  1. Forward propagation
  2. Backward propagation
  3. Calculate cost & gradient

In this article, we are dealing with (1) forward propagation.

In figure 1, we can see our network diagram with much of the details removed. We will focus on one unit in level 2 and one unit in level 3. This understanding can then be copied to all units. (ps. one unit is one of the circles below)

Our goal in forward prop is to calculate A1, Z2, A2, Z3 & A3

Just so we can visualise the X features, see figure 2 and for some more info on the data, see part 1.

Initial weights (thetas)

As it turns out, this is quite an important topic for gradient descent. If you have not dealt with gradient descent, then check this article first. We can see above that we need 2 sets of weights. (signified by ø). We often still calls these weights theta and they mean the same thing.

We need one set of thetas for level 2 and a 2nd set for level 3. Each theta is a matrix and is size(L) * size(L-1). Thus for above:

  • Theta1 = 6x4 matrix

  • Theta2 = 7x7 matrix

We have to now guess at which initial thetas should be our starting point. Here, epsilon comes to the rescue and below is the matlab code to easily generate some random small numbers for our initial weights.

function weights = initializeWeights(inSize, outSize)
  epsilon = 0.12;
  weights = rand(outSize, 1 + inSize) * 2 * epsilon - epsilon;

After running above function with our sizes for each theta as mentioned above, we will get some good small random initial values as in figure 3

. For figure 1 above, the weights we mention would refer to rows 1 in below matrix’s.

Now, that we have our initial weights, we can go ahead and run gradient descent. However, this needs a cost function to help calculate the cost and gradients as it goes along. Before we can calculate the costs, we need to perform forward propagation to calculate our A1, Z2, A2, Z3 and A3 as per figure 1.

#machine-learning #machine-intelligence #neural-network-algorithm #neural-networks #networks

Evaluating Performance of a Neural Network


A small insurance company, Texas Giant Insurance (TGI) focuses on providing commercial and personal insurance programs to its clients. TGI is an independent insurance company with an in-depth knowledge of multiple insurance products and carriers. They proactively provide service to their policyholders and present them to their clients.

The goal of this project is to first, validate that a NN model is more powerful in accuracy than other models and two, how we can leverage this information to mitigate customers from leaving and reclaim customers that have left TGI.


The dataset we received was of TGI customers between January 2017 and December 2019. The dataset was not properly formatted to be consumed by our models, but we did not have any missing values. As with insurance companies, their data is usually stored in a system that was not made for analysis but rather for accounting purposes. A significant amount of time was spent to learn the data features and determine any meaningful features that should be extracted. After going back and forward with the client (TGI), we ended up getting access to the data of 794 customers (observations). However, 81 of these observations were of customers who had inquired about products and services from TGI but never ended up becoming a customer. We ignored these observations, and this reduced our dataset to 713 observations. Since the insurance industry is heavily regulated, I was not able to get additional demographic information of the customer and had to do the best I could with the provided dataset.

Image for post

Table 1: Selected & Newly Created Features

We created new features from the dataset that was provided and formatted the data, so each observation is associated with that customer. One of the features we created was the duration of the customer (DurationAsCust) so that even if the duration of the policy changed or the type of policies changed between the years, we could capture the entire value of the customer. Another feature we created was to capture the significance of the customer so if the customer had multiple policies per year, we wanted to capture the sum of all those policies for the life of the customer (Total Duration).

We created new features from the dataset that was provided and formatted the data, so each observation is associated with that customer. One of the features we created was the duration of the customer (DurationAsCust) so that even if the duration of the policy changed or the type of policies changed between the years, we could capture the entire value of the customer. Another feature we created was to capture the significance of the customer so if the customer had multiple policies per year, we wanted to capture the sum of all those policies for the life of the customer (Total Duration).

Exploratory Data Analysis (EDA):

Most of the EDA figures, as well as, Histograms, Correlation Plot, Mean, Standard Deviations, Minimum, Maximum and other summary statistic as part of EDA are provided in the report (see PDF file in Github).

Figure 1 shows us the split of our response (target) variable: StillCustomer (0: Not a Customer, 1: Still a Customer). Out of 713 observations, 62.7% (448) are still a customer and 37.2% (265) are no longer a customer. While we want a good balance between the classes in our response variable, the 63% to 37% split is not terrible. We did execute class weights function in the sklearn library to balance the model but realized that it was not making a significant impact. Therefore, we elected to not balance the data as we did not want to make the model more complicated than it was necessary.

Image for post

Image for post

Figure 1: Proportionate of Customers

A feature that we had created was looking at whether during the life of the customer, it ever paid a premium in full instead of financing it or paying it in installments. Since we did not have any socioeconomic information about the customer, we wanted to derive any information that would be indicative of their economic standing. Figure 2 shows that there is a split among customers who are no longer active and whether they have ever paid full their premium. However, if we look at those who are still a customer, we see a large portion of these customers having paid their premiums in full at the least once during their lifetime at TGI.

Image for post

Figure 2: Comparing Customers with having Paid Full Premium Before

Duration of a customer and total value derived from a customer are quite important when looking at ways to improve customer experience and ultimately increase revenue. Figure 3 shows us a Kernel Density Estimation (KDE) plot to estimate the Probability Density Function (PDF) of duration in months compared to whether the customer is still active. What is interesting is that there is an intersection between the two classes at approximately 40 months. It would require further analysis to gauge whether that intersection exists because of the type of service that occurred with the customer at that time.

#data-science #customer-churn #bayesian-optimization #neural-networks #pycaret #neural networks

A Comparative Analysis of Recurrent Neural Networks

Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. RNN models are mostly used in the fields of natural language processing and speech recognition.

The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.

Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU.

1D Convolution_ layer_ creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. It is very effective for deriving features from a fixed-length segment of the overall dataset. A 1D CNN works well for natural language processing (NLP).

DATASET: IMDb Movie Review

TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as [](, enabling easy-to-use and high-performance input pipelines.


This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Load the Dataset

import tensorflow as tf
import tensorflow_datasets

imdb, info=tensorflow_datasets.load("imdb_reviews", with_info=True, as_supervised=True)

Image for post


Image for post

Training and Testing Data

train_data, test_data=imdb['train'], imdb['test']

for s,l in train_data:
for s,l in test_data:

Tokenization and Padding

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer= Tokenizer(num_words=vocab_size, oov_token=oov_tok)
padded=pad_sequences(sequences, maxlen=max_length, truncating=trunc_type)
testing_padded=pad_sequences(testing_sequences, maxlen=max_length)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding

Multi-layer Bidirectional LSTM

#imdb #convolutional-network #long-short-term-memory #recurrent-neural-network #gated-recurrent-unit #neural networks