Solving the Vanishing Gradient Problem with Self-Normalizing Neural Networks using Keras

Problem Statement

Training deep neural networks can be a challenging task, especially for very deep models. A major part of this difficulty is due to the instability of the gradients computed via backpropagation. In this post, we will learn **how to create a self-normalizing deep feed-forward neural network using Keras. **This will solve the gradient instability issue, speeding up training convergence, and improving model performance.

Disclaimer: This article is a brief summary with focus on implementation. Please read the cited papers for full details and mathematical justification (link in sources section).

Background

In their 2010 landmark paper, Xavier Glorot and Yoshua Bengio provided invaluable insights concerning the difficulty of training deep neural networks.

It turns out the then-popular choice of activation function and weight initialization technique were directly contributing to what is known as the Vanishing/Exploding Gradient Problem.

In succinct terms, this is when the gradients start shrinking or increasing so much that they make training impossible.

Saturating Activation Functions

Before the wide adoption of the now ubiquitous ReLU function and its variants, sigmoid functions (S-shaped) were the most popular choice of the activation function. One such example of a sigmoid activation is the logistic function:

Image for post

**Source: **https://www.mhnederlof.nl/logistic.html

**One major disadvantage of sigmoid functions is that they saturate. **In the case of the logistic function, the outputs saturate to either 0 or 1 for negative and positive inputs respectively. This leads to smaller and smaller gradients (_very _close to 0) as the magnitude of the inputs increases.

#neural-networks #ai #convergence #machine-learning #keras #deep-learning

What is GEEK

Buddha Community

Solving the Vanishing Gradient Problem with Self-Normalizing Neural Networks using Keras

Solving the Vanishing Gradient Problem with Self-Normalizing Neural Networks using Keras

Problem Statement

Training deep neural networks can be a challenging task, especially for very deep models. A major part of this difficulty is due to the instability of the gradients computed via backpropagation. In this post, we will learn **how to create a self-normalizing deep feed-forward neural network using Keras. **This will solve the gradient instability issue, speeding up training convergence, and improving model performance.

Disclaimer: This article is a brief summary with focus on implementation. Please read the cited papers for full details and mathematical justification (link in sources section).

Background

In their 2010 landmark paper, Xavier Glorot and Yoshua Bengio provided invaluable insights concerning the difficulty of training deep neural networks.

It turns out the then-popular choice of activation function and weight initialization technique were directly contributing to what is known as the Vanishing/Exploding Gradient Problem.

In succinct terms, this is when the gradients start shrinking or increasing so much that they make training impossible.

Saturating Activation Functions

Before the wide adoption of the now ubiquitous ReLU function and its variants, sigmoid functions (S-shaped) were the most popular choice of the activation function. One such example of a sigmoid activation is the logistic function:

Image for post

**Source: **https://www.mhnederlof.nl/logistic.html

**One major disadvantage of sigmoid functions is that they saturate. **In the case of the logistic function, the outputs saturate to either 0 or 1 for negative and positive inputs respectively. This leads to smaller and smaller gradients (_very _close to 0) as the magnitude of the inputs increases.

#neural-networks #ai #convergence #machine-learning #keras #deep-learning

Vincent Lab

Vincent Lab

1605176864

How to do Problem Solving as a Developer

In this video, I will be talking about problem-solving as a developer.

#problem solving skills #problem solving how to #problem solving strategies #problem solving #developer

Marlon  Boyle

Marlon Boyle

1594312560

Autonomous Driving Network (ADN) On Its Way

Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.

Image for post

Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.

The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.

Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.

In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.

The Vision

At the current stage, there are different terminologies to describe ADN vision by various organizations.
Image for post

Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.

On the network layer, it contains the below critical aspects:

  • Intent-Driven: Understand the operator’s business intent and automatically translate it into necessary network operations. The operation can be a one-time operation like disconnect a connection service or continuous operations like maintaining a specified SLA (Service Level Agreement) at the all-time.
  • **Self-Discover: **Automatically discover hardware/software changes in the network and populate the changes to the necessary subsystems to maintain always-sync state.
  • **Self-Config/Self-Organize: **Whenever network changes happen, automatically configure corresponding hardware/software parameters such that the network is at the pre-defined target states.
  • **Self-Monitor: **Constantly monitor networks/services operation states and health conditions automatically.
  • Auto-Detect: Detect network faults, abnormalities, and intrusions automatically.
  • **Self-Diagnose: **Automatically conduct an inference process to figure out the root causes of issues.
  • **Self-Healing: **Automatically take necessary actions to address issues and bring the networks/services back to the desired state.
  • **Self-Report: **Automatically communicate with its environment and exchange necessary information.
  • Automated common operational scenarios: Automatically perform operations like network planning, customer and service onboarding, network change management.

On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).

No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.

David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):

“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”

Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)

“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”

Is This Vision Achievable?

If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:

  • software-defined networking (SDN) control
  • industry-standard models and open APIs
  • Real-time analytics/telemetry
  • big data processing
  • cross-domain orchestration
  • programmable infrastructure
  • cloud-native virtualized network functions (VNF)
  • DevOps agile development process
  • everything-as-service design paradigm
  • intelligent process automation
  • edge computing
  • cloud infrastructure
  • programing paradigm suitable for building an autonomous system . i.e., teleo-reactive programs, which is a set of reactive rules that continuously sense the environment and trigger actions whose continuous execution eventually leads the system to satisfy a goal. (Nils Nilsson, 1996)
  • open-source solutions

#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks

Mckenzie  Osiki

Mckenzie Osiki

1623135499

No Code introduction to Neural Networks

The simple architecture explained

Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.

In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.

Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:

#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks

Keras Tutorial - Ultimate Guide to Deep Learning - DataFlair

Welcome to DataFlair Keras Tutorial. This tutorial will introduce you to everything you need to know to get started with Keras. You will discover the characteristics, features, and various other properties of Keras. This article also explains the different neural network layers and the pre-trained models available in Keras. You will get the idea of how Keras makes it easier to try and experiment with new architectures in neural networks. And how Keras empowers new ideas and its implementation in a faster, efficient way.

Keras Tutorial

Introduction to Keras

Keras is an open-source deep learning framework developed in python. Developers favor Keras because it is user-friendly, modular, and extensible. Keras allows developers for fast experimentation with neural networks.

Keras is a high-level API and uses Tensorflow, Theano, or CNTK as its backend. It provides a very clean and easy way to create deep learning models.

Characteristics of Keras

Keras has the following characteristics:

  • It is simple to use and consistent. Since we describe models in python, it is easy to code, compact, and easy to debug.
  • Keras is based on minimal substructure, it tries to minimize the user actions for common use cases.
  • Keras allows us to use multiple backends, provides GPU support on CUDA, and allows us to train models on multiple GPUs.
  • It offers a consistent API that provides necessary feedback when an error occurs.
  • Using Keras, you can customize the functionalities of your code up to a great extent. Even small customization makes a big change because these functionalities are deeply integrated with the low-level backend.

Benefits of using Keras

The following major benefits of using Keras over other deep learning frameworks are:

  • The simple API structure of Keras is designed for both new developers and experts.
  • The Keras interface is very user friendly and is pretty optimized for general use cases.
  • In Keras, you can write custom blocks to extend it.
  • Keras is the second most popular deep learning framework after TensorFlow.
  • Tensorflow also provides Keras implementation using its tf.keras module. You can access all the functionalities of Keras in TensorFlow using tf.keras.

Keras Installation

Before installing TensorFlow, you should have one of its backends. We prefer you to install Tensorflow. Install Tensorflow and Keras using pip python package installer.

Starting with Keras

The basic data structure of Keras is model, it defines how to organize layers. A simple type of model is the Sequential model, a sequential way of adding layers. For more flexible architecture, Keras provides a Functional API. Functional API allows you to take multiple inputs and produce outputs.

Keras Sequential model

Keras Functional API

It allows you to define more complex models.

#keras tutorials #introduction to keras #keras models #keras tutorial #layers in keras #why learn keras