Crowd Counting Using Bayesian Multi Scale Neural Networks

Convolutional Neural Networks based on estimating the density map over the image has been highly successful for crowd counting. However dense crowd counting remains an open problem because of severe occlusion and perspective view in which people can be present at various shape and sizes. This blog presents our research work done on Crowd Counting by combining Convolutional Neural Networks and uncertainty quantification.

Important Points

  1. We propose a new network which uses a ResNet based feature extractor, downsampling block using dilated convolutions and upsampling block using transposed convolutions.
  2. We present a novel aggregation module which makes our network robust to the perspective view problem.
  3. We present the optimization details, loss functions and the algorithm used in our work.
  4. We used ShanghaiTech, UCF-CC-50 and UCF-QNRF datasets for training and testing.
  5. Using MSE and MAE as evaluation metrics, our network outperforms previous state-of-the-art approaches while giving uncertainty estimates in a principled bayesian manner.


Crowd Counting has a range of applications like counting the number of participants in political rallies, social and sports events, etc.

Crowd Counting is a difficult problem especially in dense crowds due to two main reasons:

  1. There is often clutter, overlap and occlusions present.
  2. In perspective view it is difficult to take into account the shape and size of object present with respect to the background.

A lot of algorithms have been proposed in the literature for tackling this problem. Most of them use some form of convolutional neural network along with a density map estimation which predicts a density map over the input image and then summing to get the count of objects.


The following datasets were used in this work for training and testing the network:

  1. ShanghaiTech is made up of two datasets labelled as part A and part B. In part A, there are 300 images for training and 182 images for testing while Part B has 400 training images and 316 testing images.
  2. UCF-CC-50 contains 50 gray images with different resolutions. The average count for each image is 1,280, and the minimum and maximum counts are 94 and 4,532, respectively.
  3. UCF-QNRF is the third dataset used in this work which has 1535 images with 1.25 million point annotations. The training set has 1,201 images and 334 images are used for testing.

Network Architecture

The network architecture used in this work is described in the following points:

  1. A ResNet based feature extractor is used with dilated convolutions which is defined as a downsampling block. This helps in extracting the details of objects at various scales hence solving the perspective view problem faced by earlier approaches.
  2. The upsampling block uses transposed convolutions with skip connections in between the two creating an additional pathway thus avoiding overfitting.
  3. The last part has three heads: output of density map which when integrated gives the absolute count, epistemic uncertainty and aleatoric uncertainty heads.

The network architecture along with layerwise details used in this work is shown in Figure 1:

Image for post

Figure 1: Our Neural network architecture

Where 1×1, 3×3 denotes Filters, 64, 128, 256 denotes Receptive Field, conv denotes Dilated Convolutional layer and conv-2 denotes Transposed convolutional layer.


To solve the vanishing gradient problem, instance normalization was used after both dilated convolutional and transposed convolutional layers as defined in Equation 1:

Image for post

where w and b are weight and bias term of the convolution layer, γ and β are weight and bias term of the Instance Normalization layer, µ and σ are mean and variance of the input.

We propose a new technique to aggregate the filters with sizes 1×1, 3×3, 5×5. ReLU is applied after every convolutional and transposed convolutional layer. Our novel aggregation module used is shown in Figure 2:

Image for post

Figure 2: The architecture of our aggregation module

The filter branches make our network robust and can be extended by using more filters to tackle crowd counting in dense scenes. Our aggregation modules stacked on top of each other behave as ensembles thus minimizing overfitting which is a challenge while training deep networks.

#deep-learning #computer-vision #machine-learning #deep learning

What is GEEK

Buddha Community

Crowd Counting Using Bayesian Multi Scale Neural Networks
Adaline  Kulas

Adaline Kulas


Multi-cloud Spending: 8 Tips To Lower Cost

A multi-cloud approach is nothing but leveraging two or more cloud platforms for meeting the various business requirements of an enterprise. The multi-cloud IT environment incorporates different clouds from multiple vendors and negates the dependence on a single public cloud service provider. Thus enterprises can choose specific services from multiple public clouds and reap the benefits of each.

Given its affordability and agility, most enterprises opt for a multi-cloud approach in cloud computing now. A 2018 survey on the public cloud services market points out that 81% of the respondents use services from two or more providers. Subsequently, the cloud computing services market has reported incredible growth in recent times. The worldwide public cloud services market is all set to reach $500 billion in the next four years, according to IDC.

By choosing multi-cloud solutions strategically, enterprises can optimize the benefits of cloud computing and aim for some key competitive advantages. They can avoid the lengthy and cumbersome processes involved in buying, installing and testing high-priced systems. The IaaS and PaaS solutions have become a windfall for the enterprise’s budget as it does not incur huge up-front capital expenditure.

However, cost optimization is still a challenge while facilitating a multi-cloud environment and a large number of enterprises end up overpaying with or without realizing it. The below-mentioned tips would help you ensure the money is spent wisely on cloud computing services.

  • Deactivate underused or unattached resources

Most organizations tend to get wrong with simple things which turn out to be the root cause for needless spending and resource wastage. The first step to cost optimization in your cloud strategy is to identify underutilized resources that you have been paying for.

Enterprises often continue to pay for resources that have been purchased earlier but are no longer useful. Identifying such unused and unattached resources and deactivating it on a regular basis brings you one step closer to cost optimization. If needed, you can deploy automated cloud management tools that are largely helpful in providing the analytics needed to optimize the cloud spending and cut costs on an ongoing basis.

  • Figure out idle instances

Another key cost optimization strategy is to identify the idle computing instances and consolidate them into fewer instances. An idle computing instance may require a CPU utilization level of 1-5%, but you may be billed by the service provider for 100% for the same instance.

Every enterprise will have such non-production instances that constitute unnecessary storage space and lead to overpaying. Re-evaluating your resource allocations regularly and removing unnecessary storage may help you save money significantly. Resource allocation is not only a matter of CPU and memory but also it is linked to the storage, network, and various other factors.

  • Deploy monitoring mechanisms

The key to efficient cost reduction in cloud computing technology lies in proactive monitoring. A comprehensive view of the cloud usage helps enterprises to monitor and minimize unnecessary spending. You can make use of various mechanisms for monitoring computing demand.

For instance, you can use a heatmap to understand the highs and lows in computing visually. This heat map indicates the start and stop times which in turn lead to reduced costs. You can also deploy automated tools that help organizations to schedule instances to start and stop. By following a heatmap, you can understand whether it is safe to shut down servers on holidays or weekends.

#cloud computing services #all #hybrid cloud #cloud #multi-cloud strategy #cloud spend #multi-cloud spending #multi cloud adoption #why multi cloud #multi cloud trends #multi cloud companies #multi cloud research #multi cloud market

Vern  Greenholt

Vern Greenholt


Bayesian Neural Networks: 2 Fully Connected in TensorFlow and Pytorch

This chapter continues the series on Bayesian deep learning. In the chapter we’ll explore alternative solutions to conventional dense neural networks. These alternatives will invoke probability distributions over each weight in the neural network resulting in a single model that effectively contains an infinite ensemble of neural networks trained on the same data. We’ll use this knowledge to solve an important problem of our age: how long to boil an egg.

Chapter Objectives:

  • Become familiar with variational inference with dense Bayesian models
  • Learn how to convert a normal fully connected (dense) neural network to a Bayesian neural network
  • Appreciate the advantages and shortcomings of the current implementation

The data is from an experiment in egg boiling. The boil durations are provided along with the egg’s weight in grams and the finding on cutting it open. Findings are categorised into one of three classes: under cooked, soft-boiled and hard-boiled. We want the egg’s outcome from its weight and boiling time. The problem is insanely simple, so much so that the data is near being linearly separable¹⁠. But not quite, as the egg’s pre-boil life (fridge temperature or cupboard storage at room temperature) aren’t provided and as you’ll see this swings cooking times. Without the missing data we can’t be certain what we’ll find when opening an egg up. Knowing how certain we are we can influence the outcome here as we can with most problems. In this case if relatively confident an egg’s undercooked we’ll cook it more before cracking it open.

Image for post

Let’s have a look at the data first to see what we’re dealing with. If you want to feel the difference for yourself you can get the data at You’ll need Pandas and Matplotlib for exploring the data. (pip install — upgrade pandas matplotlib) Download the dataset to the same directory you’re working from. From a Jupyter notebook type pwd on its own in a cell to find out where that directory is if unsure.

Image for post

Image for post

Figure 2.01 Scatter plot of egg outcomes

And let’s see it now as a histogram.

Image for post

Figure 2.02 Histogram of egg times by outcome

It seems I wasn’t so good at getting my eggs soft-boiled as I like them so we see a fairly large class imbalance with twice as many underdone instances and three times as many hardboiled instances relative to the soft-boiled lovelies. This class imbalance can spell trouble for conventional neural networks causing them to underperform and an imbalanced class size is a common finding.

Note that we’re not setting density to True (False is the default so doesn’t need to be specified) as we’re interested in comparing actual numbers. While if we were comparing probabilities sampled from one of the three random variables, we’d want to set density=True to normalise the histogram summing the data to 1.0.

#editors-pick #bayesian-machine-learning #deep-learning #bayesian-neural-network #neural-networks #deep learning

Mckenzie  Osiki

Mckenzie Osiki


No Code introduction to Neural Networks

The simple architecture explained

Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.

In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.

Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:

#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks

Marlon  Boyle

Marlon Boyle


Autonomous Driving Network (ADN) On Its Way

Talking about inspiration in the networking industry, nothing more than Autonomous Driving Network (ADN). You may hear about this and wondering what this is about, and does it have anything to do with autonomous driving vehicles? Your guess is right; the ADN concept is derived from or inspired by the rapid development of the autonomous driving car in recent years.

Image for post

Driverless Car of the Future, the advertisement for “America’s Electric Light and Power Companies,” Saturday Evening Post, the 1950s.

The vision of autonomous driving has been around for more than 70 years. But engineers continuously make attempts to achieve the idea without too much success. The concept stayed as a fiction for a long time. In 2004, the US Defense Advanced Research Projects Administration (DARPA) organized the Grand Challenge for autonomous vehicles for teams to compete for the grand prize of $1 million. I remembered watching TV and saw those competing vehicles, behaved like driven by drunk man, had a really tough time to drive by itself. I thought that autonomous driving vision would still have a long way to go. To my surprise, the next year, 2005, Stanford University’s vehicles autonomously drove 131 miles in California’s Mojave desert without a scratch and took the $1 million Grand Challenge prize. How was that possible? Later I learned that the secret ingredient to make this possible was using the latest ML (Machine Learning) enabled AI (Artificial Intelligent ) technology.

Since then, AI technologies advanced rapidly and been implemented in all verticals. Around the 2016 time frame, the concept of Autonomous Driving Network started to emerge by combining AI and network to achieve network operational autonomy. The automation concept is nothing new in the networking industry; network operations are continually being automated here and there. But this time, ADN is beyond automating mundane tasks; it reaches a whole new level. With the help of AI technologies and other critical ingredients advancement like SDN (Software Defined Network), autonomous networking has a great chance from a vision to future reality.

In this article, we will examine some critical components of the ADN, current landscape, and factors that are important for ADN to be a success.

The Vision

At the current stage, there are different terminologies to describe ADN vision by various organizations.
Image for post

Even though slightly different terminologies, the industry is moving towards some common terms and consensus called autonomous networks, e.g. TMF, ETSI, ITU-T, GSMA. The core vision includes business and network aspects. The autonomous network delivers the “hyper-loop” from business requirements all the way to network and device layers.

On the network layer, it contains the below critical aspects:

  • Intent-Driven: Understand the operator’s business intent and automatically translate it into necessary network operations. The operation can be a one-time operation like disconnect a connection service or continuous operations like maintaining a specified SLA (Service Level Agreement) at the all-time.
  • **Self-Discover: **Automatically discover hardware/software changes in the network and populate the changes to the necessary subsystems to maintain always-sync state.
  • **Self-Config/Self-Organize: **Whenever network changes happen, automatically configure corresponding hardware/software parameters such that the network is at the pre-defined target states.
  • **Self-Monitor: **Constantly monitor networks/services operation states and health conditions automatically.
  • Auto-Detect: Detect network faults, abnormalities, and intrusions automatically.
  • **Self-Diagnose: **Automatically conduct an inference process to figure out the root causes of issues.
  • **Self-Healing: **Automatically take necessary actions to address issues and bring the networks/services back to the desired state.
  • **Self-Report: **Automatically communicate with its environment and exchange necessary information.
  • Automated common operational scenarios: Automatically perform operations like network planning, customer and service onboarding, network change management.

On top of those, these capabilities need to be across multiple services, multiple domains, and the entire lifecycle(TMF, 2019).

No doubt, this is the most ambitious goal that the networking industry has ever aimed at. It has been described as the “end-state” and“ultimate goal” of networking evolution. This is not just a vision on PPT, the networking industry already on the move toward the goal.

David Wang, Huawei’s Executive Director of the Board and President of Products & Solutions, said in his 2018 Ultra-Broadband Forum(UBBF) keynote speech. (David W. 2018):

“In a fully connected and intelligent era, autonomous driving is becoming a reality. Industries like automotive, aerospace, and manufacturing are modernizing and renewing themselves by introducing autonomous technologies. However, the telecom sector is facing a major structural problem: Networks are growing year by year, but OPEX is growing faster than revenue. What’s more, it takes 100 times more effort for telecom operators to maintain their networks than OTT players. Therefore, it’s imperative that telecom operators build autonomous driving networks.”

Juniper CEO Rami Rahim said in his keynote at the company’s virtual AI event: (CRN, 2020)

“The goal now is a self-driving network. The call to action is to embrace the change. We can all benefit from putting more time into higher-layer activities, like keeping distributors out of the business. The future, I truly believe, is about getting the network out of the way. It is time for the infrastructure to take a back seat to the self-driving network.”

Is This Vision Achievable?

If you asked me this question 15 years ago, my answer would be “no chance” as I could not imagine an autonomous driving vehicle was possible then. But now, the vision is not far-fetch anymore not only because of ML/AI technology rapid advancement but other key building blocks are made significant progress, just name a few key building blocks:

  • software-defined networking (SDN) control
  • industry-standard models and open APIs
  • Real-time analytics/telemetry
  • big data processing
  • cross-domain orchestration
  • programmable infrastructure
  • cloud-native virtualized network functions (VNF)
  • DevOps agile development process
  • everything-as-service design paradigm
  • intelligent process automation
  • edge computing
  • cloud infrastructure
  • programing paradigm suitable for building an autonomous system . i.e., teleo-reactive programs, which is a set of reactive rules that continuously sense the environment and trigger actions whose continuous execution eventually leads the system to satisfy a goal. (Nils Nilsson, 1996)
  • open-source solutions

#network-automation #autonomous-network #ai-in-network #self-driving-network #neural-networks

Evaluating Performance of a Neural Network


A small insurance company, Texas Giant Insurance (TGI) focuses on providing commercial and personal insurance programs to its clients. TGI is an independent insurance company with an in-depth knowledge of multiple insurance products and carriers. They proactively provide service to their policyholders and present them to their clients.

The goal of this project is to first, validate that a NN model is more powerful in accuracy than other models and two, how we can leverage this information to mitigate customers from leaving and reclaim customers that have left TGI.


The dataset we received was of TGI customers between January 2017 and December 2019. The dataset was not properly formatted to be consumed by our models, but we did not have any missing values. As with insurance companies, their data is usually stored in a system that was not made for analysis but rather for accounting purposes. A significant amount of time was spent to learn the data features and determine any meaningful features that should be extracted. After going back and forward with the client (TGI), we ended up getting access to the data of 794 customers (observations). However, 81 of these observations were of customers who had inquired about products and services from TGI but never ended up becoming a customer. We ignored these observations, and this reduced our dataset to 713 observations. Since the insurance industry is heavily regulated, I was not able to get additional demographic information of the customer and had to do the best I could with the provided dataset.

Image for post

Table 1: Selected & Newly Created Features

We created new features from the dataset that was provided and formatted the data, so each observation is associated with that customer. One of the features we created was the duration of the customer (DurationAsCust) so that even if the duration of the policy changed or the type of policies changed between the years, we could capture the entire value of the customer. Another feature we created was to capture the significance of the customer so if the customer had multiple policies per year, we wanted to capture the sum of all those policies for the life of the customer (Total Duration).

We created new features from the dataset that was provided and formatted the data, so each observation is associated with that customer. One of the features we created was the duration of the customer (DurationAsCust) so that even if the duration of the policy changed or the type of policies changed between the years, we could capture the entire value of the customer. Another feature we created was to capture the significance of the customer so if the customer had multiple policies per year, we wanted to capture the sum of all those policies for the life of the customer (Total Duration).

Exploratory Data Analysis (EDA):

Most of the EDA figures, as well as, Histograms, Correlation Plot, Mean, Standard Deviations, Minimum, Maximum and other summary statistic as part of EDA are provided in the report (see PDF file in Github).

Figure 1 shows us the split of our response (target) variable: StillCustomer (0: Not a Customer, 1: Still a Customer). Out of 713 observations, 62.7% (448) are still a customer and 37.2% (265) are no longer a customer. While we want a good balance between the classes in our response variable, the 63% to 37% split is not terrible. We did execute class weights function in the sklearn library to balance the model but realized that it was not making a significant impact. Therefore, we elected to not balance the data as we did not want to make the model more complicated than it was necessary.

Image for post

Image for post

Figure 1: Proportionate of Customers

A feature that we had created was looking at whether during the life of the customer, it ever paid a premium in full instead of financing it or paying it in installments. Since we did not have any socioeconomic information about the customer, we wanted to derive any information that would be indicative of their economic standing. Figure 2 shows that there is a split among customers who are no longer active and whether they have ever paid full their premium. However, if we look at those who are still a customer, we see a large portion of these customers having paid their premiums in full at the least once during their lifetime at TGI.

Image for post

Figure 2: Comparing Customers with having Paid Full Premium Before

Duration of a customer and total value derived from a customer are quite important when looking at ways to improve customer experience and ultimately increase revenue. Figure 3 shows us a Kernel Density Estimation (KDE) plot to estimate the Probability Density Function (PDF) of duration in months compared to whether the customer is still active. What is interesting is that there is an intersection between the two classes at approximately 40 months. It would require further analysis to gauge whether that intersection exists because of the type of service that occurred with the customer at that time.

#data-science #customer-churn #bayesian-optimization #neural-networks #pycaret #neural networks