The sound of birdsong is varied, beautiful, and relaxing. In the pre-Covid times, I made a focus timer which would play some recorded bird sounds during breaks, and I always wondered whether such sounds could be generated. After some trial and error, I landed on a proof-of-concept architecture which can both successfully reproduce a single chirp and has parameters which can be adjusted to alter the generated sound.
Since generating bird sounds seems like a somewhat novel application, I think it is worth sharing this approach. Along the way, I also learned how to take TensorFlow models apart and graft parts of them together. The code blocks below show how this is done. The full code can be found here.
The generator will be composed two parts. The first part will take the entire sounds and encode key pieces of information about its overall shape in a small number of parameters.
The second part will take a small bit of sound, along with the information about the overall shape, and predict the next little bit of sound.
The second part can be called iteratively on itself with adjusted parameters to produce an entirely new chirp!
An autoencoder structure is used for deriving the key parameters of the sound. This structure takes the entire soundwave and reduces it, through a series of (encoding) layers, down to a small number of components (the waist), before reproducing the sound in full from a series of expanding (decoding) layers. Once trained, the autoencoder model is cut off at the waist so that all it does is reduce the full sound down to the key parameters.
For the proof of concept, a single chirp was used; this chirp:
Soundwave representation of employed chirp.
It comes from the Cornell Guide to Bird Sounds: Essential Set for North America. The same set used for the Birds Sounds Chrome Experiment.
One problem with using just a single sound is that the autoencoder might simply hide all the information about the sound in the biases of the decoding layers, leaving the waist with all zero weights. To mitigate this, the sounds was morphed during training by altering its amplitude and shifting it around a little.
The encoder portion of the autoencoder consists of a series of convolutional layers which compress a 3000-ish long sounds wave down to around 20 numbers, hopefully retaining important information along the way. Since sounds are composed of many different sine waves, allowing many convolutional filters of different sizes to pass over the sound can in theory capture key information about the composite waves. A waist size of 20 was chosen mainly because this seems like a somewhat surmountable number of adjustable parameters.
In this first approach, the layers are stacked sequentially. In a future version, it may be advantageous to use a structure akin to inception-net blocks to run convolutions of different sizes in parallel.
The decoder portion of the model consists of two dense layers, one of length 400, and one of length 3000 — the same length as the input sound. The activation function of the final layer is tanh, as the sound wave representations have values between -1 and 1.
#bird-sounds #machine-learning #convolutional-network #generative-model #tensorflow
Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.
In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.
Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:
#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks
Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.
Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.
The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.
Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.
In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.
At the current stage, there are different terminologies to describe ADN vision by various organizations.
Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.
On the network layer, it contains the below critical aspects:
On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).
No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.
David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):
“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”
Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)
“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”
If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:
#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks
Forward propagation is an important part of neural networks. Its not as hard as it sounds ;-)
So, to perform gradient descent or cost optimisation, we need to write a cost function which performs:
In this article, we are dealing with (1) forward propagation.
In figure 1, we can see our network diagram with much of the details removed. We will focus on one unit in level 2 and one unit in level 3. This understanding can then be copied to all units. (ps. one unit is one of the circles below)
Our goal in forward prop is to calculate A1, Z2, A2, Z3 & A3
Just so we can visualise the X features, see figure 2 and for some more info on the data, see part 1.
As it turns out, this is quite an important topic for gradient descent. If you have not dealt with gradient descent, then check this article first. We can see above that we need 2 sets of weights. (signified by ø). We often still calls these weights theta and they mean the same thing.
We need one set of thetas for level 2 and a 2nd set for level 3. Each theta is a matrix and is size(L) * size(L-1). Thus for above:
Theta1 = 6x4 matrix
Theta2 = 7x7 matrix
We have to now guess at which initial thetas should be our starting point. Here, epsilon comes to the rescue and below is the matlab code to easily generate some random small numbers for our initial weights.
function weights = initializeWeights(inSize, outSize) epsilon = 0.12; weights = rand(outSize, 1 + inSize) * 2 * epsilon - epsilon; end
After running above function with our sizes for each theta as mentioned above, we will get some good small random initial values as in figure 3
. For figure 1 above, the weights we mention would refer to rows 1 in below matrix’s.
Now, that we have our initial weights, we can go ahead and run gradient descent. However, this needs a cost function to help calculate the cost and gradients as it goes along. Before we can calculate the costs, we need to perform forward propagation to calculate our A1, Z2, A2, Z3 and A3 as per figure 1.
#machine-learning #machine-intelligence #neural-network-algorithm #neural-networks #networks
Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. RNN models are mostly used in the fields of natural language processing and speech recognition.
The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.
Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU.
1D Convolution_ layer_ creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. It is very effective for deriving features from a fixed-length segment of the overall dataset. A 1D CNN works well for natural language processing (NLP).
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as
[_tf.data.Datasets_](https://www.tensorflow.org/api_docs/python/tf/data/Dataset), enabling easy-to-use and high-performance input pipelines.
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns %matplotlib inline
import tensorflow as tf import tensorflow_datasets imdb, info=tensorflow_datasets.load("imdb_reviews", with_info=True, as_supervised=True) imdb
train_data, test_data=imdb['train'], imdb['test'] training_sentences= training_label= testing_sentences= testing_label= for s,l in train_data: training_sentences.append(str(s.numpy())) training_label.append(l.numpy()) for s,l in test_data: testing_sentences.append(str(s.numpy())) testing_label.append(l.numpy()) training_label_final=np.array(training_label) testing_label_final=np.array(testing_label)
vocab_size=10000 embedding_dim=16 max_length=120 trunc_type='post' oov_tok='<oov>' from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences tokenizer= Tokenizer(num_words=vocab_size, oov_token=oov_tok) tokenizer.fit_on_texts(training_sentences) word_index=tokenizer.word_index sequences=tokenizer.texts_to_sequences(training_sentences) padded=pad_sequences(sequences, maxlen=max_length, truncating=trunc_type) testing_sequences=tokenizer.texts_to_sequences(testing_sentences) testing_padded=pad_sequences(testing_sequences, maxlen=max_length) from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, Embedding
#imdb #convolutional-network #long-short-term-memory #recurrent-neural-network #gated-recurrent-unit #neural networks
The purpose of this project is to build and evaluate Recurrent Neural Networks(RNNs) for sentence-level classification tasks. I evaluate three architectures: a two-layer Long Short-Term Memory Network(LSTM), a two-layer Bidirectional Long Short-Term Memory Network(BiLSTM), and a two-layer BiLSTM with a word-level attention layer. Although they do learn useful vector representation, BiLSTM with attention mechanism focuses on necessary tokens when learning text representation. To that end, I’m using the 2019 Google Jigsaw published dataset on Kaggle labeled “Jigsaw Unintended Bias in Toxicity Classification.” The dataset includes 1,804,874 user comments, with the toxicity level being between 0 and 1. The final models can be used for filtering online posts and comments, social media policing, and user education.
RNNs are neural networks used for problems that require sequential data processing. For instance:
At each time step t of the input sequence, RNNs compute the output yt and an internal state update ht using the input xt and the previous hidden-state ht-1. They then pass information about the current time step of the network to the next. The hidden-state ht summarizes the task-relevant aspect of the past sequence of the input up to t, allowing for information to persist over time.
Recurrent Neural Network
Recurrent Neural Network
During training, RNNs re-use the same weight matrices at each time step. Parameter sharing enables the network to generalize to different sequence lengths. The total loss is a sum of all losses at each time step, the gradients with respect to the weights are the sum of the gradients at each time step, and the parameters are updated to minimize the loss function.
forward pass: compute the loss function
Backward Pass: compute the gradients
Although RNNs learn contextual representations of sequential data, they suffer from the exploding and vanishing gradient phenomena in long sequences. These problems occur due to the multiplicative gradient that can exponentially increase or decrease through time. RNNs commonly use three activation functions: RELU, Tanh, and Sigmoid. Because the gradient calculation also involves the gradient with respect to the non-linear activations, architectures that use a RELU activation can suffer from the exploding gradient problem. Architectures that use Tanh/Sigmoid can suffer from the vanishing gradient problem. Gradient clipping — limiting the gradient within a specific range — can be used to remedy the exploding gradient. However, for the vanishing gradient problem, a more complex recurrent unit with gates such as Gated Recurrent Unit (GRU) or Long Short-Term Memory (LSTM) can be used.
#ai #recurrent-neural-network #attention-network #machine-learning #neural-network