1601150400
Machine learning and deep learning techniques have encroached in every possible domain you can think of. With the increasing availability of digitized data and the advancement of computation capability of modern computers, it will possibly flourish in near future. As part of it, every day more and more people are delving into the task of building and training models. As a newbie in the domain, I also have to build and train a few neural network models. Well, most of the time, my primary goal remains normally to make a highly accurate as well generalized model. And in order to achieve that I normally break my head over finding proper hyper-parameters for my model, what all different regularization techniques I can apply, or thinking about the question, do I need to further deepen my model and so on. But often I forget to play around with the weight-initialization techniques. And I believe this is the case for many others like me.
Wait, weight initialization? Does it matter at all? I dedicate this article to find out the answer to the questions. When I went over the internet to find the answer to the question, there is overwhelming information about it. Some articles talk about the mathematics behind these articles, some compare this technique theoretically and some are more into deciding whether uniform initialization is better or the normal one. In this situation, I resorted to a result-based approach. I tried to find out the impact of weight initialization techniques via a small experiment. I applied a few of these techniques into a model and tried to visualize the training process itself, keeping aside the goal of achieving high accuracy for some time.
#model-training #neural-networks #machine-learning #weight-initialization #deep-learning
1601150400
Machine learning and deep learning techniques have encroached in every possible domain you can think of. With the increasing availability of digitized data and the advancement of computation capability of modern computers, it will possibly flourish in near future. As part of it, every day more and more people are delving into the task of building and training models. As a newbie in the domain, I also have to build and train a few neural network models. Well, most of the time, my primary goal remains normally to make a highly accurate as well generalized model. And in order to achieve that I normally break my head over finding proper hyper-parameters for my model, what all different regularization techniques I can apply, or thinking about the question, do I need to further deepen my model and so on. But often I forget to play around with the weight-initialization techniques. And I believe this is the case for many others like me.
Wait, weight initialization? Does it matter at all? I dedicate this article to find out the answer to the questions. When I went over the internet to find the answer to the question, there is overwhelming information about it. Some articles talk about the mathematics behind these articles, some compare this technique theoretically and some are more into deciding whether uniform initialization is better or the normal one. In this situation, I resorted to a result-based approach. I tried to find out the impact of weight initialization techniques via a small experiment. I applied a few of these techniques into a model and tried to visualize the training process itself, keeping aside the goal of achieving high accuracy for some time.
#model-training #neural-networks #machine-learning #weight-initialization #deep-learning
1600924020
Machine learning and deep learning techniques have encroached in every possible domain you can think of. With the increasing availability of digitized data and the advancement of computation capability of modern computers, it will possibly flourish in near future. As part of it, every day more and more people are delving into the task of building and training models. As a newbie in the domain, I also have to build and train a few neural network models. Well, most of the time, my primary goal remains normally to make a highly accurate as well generalized model. And in order to achieve that I normally break my head over finding proper hyper-parameters for my model, what all different regularization techniques I can apply, or thinking about the question, do I need to further deepen my model and so on. But often I forget to play around with the weight-initialization techniques. And I believe this is the case for many others like me.
Wait, weight initialization? Does it matter at all? I dedicate this article to find out the answer to the questions. When I went over the internet to find the answer to the question, there is overwhelming information about it. Some articles talk about the mathematics behind these articles, some compare this technique theoretically and some are more into deciding whether uniform initialization is better or the normal one. In this situation, I resorted to a result-based approach. I tried to find out the impact of weight initialization techniques via a small experiment. I applied a few of these techniques into a model and tried to visualize the training process itself, keeping aside the goal of achieving high accuracy for some time.
#model-training #neural-networks #machine-learning #weight-initialization #deep-learning
1603011600
Weight and bias are the adjustable parameters of a neural network, and during the training phase, they are changed using the gradient descent algorithm to minimize the cost function of the network. However, they must be initialized before one can start training the network, and this initialization step has an important effect on the network training. In this article, I will first explain the importance of the wight initialization and then discuss the different methods that can be used for this purpose.
Notation
Currently Medium supports superscripts only for numbers, and it has no support for subscripts. So to write the name of the variables, I use this notation: Every character after ^ is a superscript character and every character after _ (and before ^ if its present) is a subscript character. For example
is written as w_ij^[l] in this notation.
Before we discuss the weight initialization methods, we briefly review the equations that govern the feedforward neural networks. For a detailed discussion of these equations, you can refer to reference [1]. Suppose that you have a feedforward neural network as shown in Figure 1. Here
is the network’s input vector. Each x_i is an input feature. The network has L layers and the number of neurons in layer l is n^[l]. The input layer is considered as layer zero. So the number of input features is n^[0]. The output or activation of neuron i in layer l is a_i^[l].
Figure 1 (Image by Author)
The wights for the neuron i in layer _l _can be represented by the vector
where w_ij^[l] represents the weight for the input j (coming from neuron j in layer l-1) going into neuron i in layer l (Figure 2).
Figure 2 (Image by Author)
In layer l, each neuron receives the output of all the neurons in the previous layer multiplied by its weights, w_i1, w_i2, . . . , w_in. The weighted inputs are summed together, and a constant value called bias (b_i^[l]) is added to them to produce the net input of the neuron
The net input of neurons in layer l can be represented by the vector
Similarly, the activation of neurons in layer l can be represented by the activation vector
So Eq. 3 can be written as
where the summation has been replaced by the inner product of the weight and activation vectors. The net input is then passed through the activation function g to produce the output or activation of neuron i
We usually assume that the input layer is the layer zero and
So for the first layer, Eq. 7 is written as
We can combine all the weights of a layer into a weight matrix for that layer
So W^[l] is an n^[l] × n^[l-1] matrix, and the (i,j) element of this matrix gives the weight of the connection that goes from the neuron j in layer l-1 to the neuron i in layer l. We can also have a bias vector for each layer
#neural-networks #weight-initialization #deep-learning #machine-learning
1623135499
Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.
In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.
Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:
#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks
1594312560
Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.
Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.
The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.
Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.
In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.
At the current stage, there are different terminologies to describe ADN vision by various organizations.
Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.
On the network layer, it contains the below critical aspects:
On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).
No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.
David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):
“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”
Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)
“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”
If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:
#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks