1678682700
This repository provides an implementation of an aesthetic and technical image quality model based on Google's research paper "NIMA: Neural Image Assessment". You can find a quick introduction on their Research Blog.
NIMA consists of two models that aim to predict the aesthetic and technical quality of images, respectively. The models are trained via transfer learning, where ImageNet pre-trained CNNs are used and fine-tuned for the classification task.
For more information on how we used NIMA for our specifc problem, we did a write-up on two blog posts:
The provided code allows to use any of the pre-trained models in Keras. We further provide Docker images for local CPU training and remote GPU training on AWS EC2, as well as pre-trained models on the AVA and TID2013 datasets.
Read the full documentation at: https://idealo.github.io/image-quality-assessment/.
Image quality assessment is compatible with Python 3.6 and is distributed under the Apache 2.0 license. We welcome all kinds of contributions, especially new model architectures and/or hyperparameter combinations that improve the performance of the currently published models (see Contribute).
We provide trained models, for both aesthetic and technical classifications, that use MobileNet as the base CNN. The models and their respective config files are stored under models/MobileNet
. They achieve the following performance
Model | Dataset | EMD | LCC | SRCC |
---|---|---|---|---|
MobileNet aesthetic | AVA | 0.071 | 0.626 | 0.609 |
MobileNet technical | TID2013 | 0.107 | 0.652 | 0.675 |
Install jq
Install Docker
Build docker image docker build -t nima-cpu . -f Dockerfile.cpu
In order to train remotely on AWS EC2
Install Docker Machine
Install AWS Command Line Interface
In order to run predictions on an image or batch of images you can run the prediction script
Single image file
./predict \
--docker-image nima-cpu \
--base-model-name MobileNet \
--weights-file $(pwd)/models/MobileNet/weights_mobilenet_technical_0.11.hdf5 \
--image-source $(pwd)/src/tests/test_images/42039.jpg
All image files in a directory
./predict \
--docker-image nima-cpu \
--base-model-name MobileNet \
--weights-file $(pwd)/models/MobileNet/weights_mobilenet_technical_0.11.hdf5 \
--image-source $(pwd)/src/tests/test_images
Download dataset (see instructions under Datasets)
Run the local training script (e.g. for TID2013 dataset)
./train-local \
--config-file $(pwd)/models/MobileNet/config_technical_cpu.json \
--samples-file $(pwd)/data/TID2013/tid_labels_train.json \
--image-dir /path/to/image/dir/local
This will start a training container from the Docker image nima-cpu
and create a timestamp train job folder under train_jobs
, where the trained model weights and logs will be stored. The --image-dir
argument requires the path of the image directory on your local machine.
In order to stop the last launched container run bash CONTAINER_ID=$(docker ps -l -q) docker container stop $CONTAINER_ID
In order to stream logs from last launched container run bash CONTAINER_ID=$(docker ps -l -q) docker logs $CONTAINER_ID --follow
Configure your AWS CLI. Ensure that your account has limits for GPU instances and read/write access to the S3 bucket specified in config file [link]
aws configure
Launch EC2 instance with Docker Machine. Choose an Ubuntu AMI based on your region (https://cloud-images.ubuntu.com/locator/ec2/). For example, to launch a p2.xlarge
EC2 instance named ec2-p2
run (NB: change region, VPC ID and AMI ID as per your setup)
docker-machine create --driver amazonec2 \
--amazonec2-region eu-west-1 \
--amazonec2-ami ami-58d7e821 \
--amazonec2-instance-type p2.xlarge \
--amazonec2-vpc-id vpc-abc \
ec2-p2
ssh into EC2 instance
docker-machine ssh ec2-p2
Update NVIDIA drivers and install nvidia-docker (see this blog post for more details)
# update NVIDIA drivers
sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt-get update
sudo apt-get install -y nvidia-375 nvidia-settings nvidia-modprobe
# install nvidia-docker
wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb
sudo dpkg -i /tmp/nvidia-docker_1.0.1-1_amd64.deb && rm /tmp/nvidia-docker_1.0.1-1_amd64.deb
Download dataset to EC2 instance (see instructions under Datasets). We recommend to save the AMI with the downloaded data for future use.
Run the remote EC2 training script (e.g. for AVA dataset)
./train-ec2 \
--docker-machine ec2-p2 \
--config-file $(pwd)/models/MobileNet/config_aesthetic_gpu.json \
--samples-file $(pwd)/data/AVA/ava_labels_train.json \
--image-dir /path/to/image/dir/remote
The training progress will be streamed to your terminal. After the training has finished, the train outputs (logs and best model weights) will be stored on S3 in a timestamped folder. The S3 output bucket can be specified in the config file. The --image-dir
argument requires the path of the image directory on your remote instance.
We welcome all kinds of contributions and will publish the performances from new models in the performance table under Trained models.
For example, to train a new aesthetic NIMA model based on InceptionV3 ImageNet weights, you just have to change the base_model_name
parameter in the config file models/MobileNet/config_aesthetic_gpu.json
to "InceptionV3". You can also control all major hyperparameters in the config file, like learning rate, batch size, or dropout rate.
See the Contribution guide for more details.
This project uses two datasets to train the NIMA model:
For training on AWS EC2 we recommend to build a custom AMI with the AVA images stored on it. This has proven much more viable than copying the entire dataset from S3 to the instance for each training job.
The train script requires JSON label files in the format
[
{
"image_id": "231893",
"label": [2,8,19,36,76,52,16,9,3,2]
},
{
"image_id": "746672",
"label": [1,2,7,20,38,52,20,11,1,3]
},
...
]
The label for each image is the normalized or un-normalized frequency distribution of ratings 1-10.
For the AVA dataset these frequency distributions are given in the raw data files. For the TID2013 dataset we inferred the normalized frequency distribution, i.e. probability distribution, by finding the maximum entropy distribution that satisfies the mean score. The code to generate the TID2013 labels can be found under data/TID2013/get_labels.py
.
For both datasets we provide train and test set label files stored under
data/AVA/ava_labels_train.json
data/AVA/ava_labels_test.json
and
data/TID2013/tid2013_labels_train.json
data/TID2013/tid2013_labels_test.json
For the AVA dataset we randomly assigned 90% of samples to the train set, and 10% to the test set, and throughout training a 5% validation set will be split from the training set to evaluate the training performance after each epoch. For the TID2013 dataset we split the train/test sets by reference images, to ensure that no reference image, and any of its distortions, enters both the train and test set.
TensorFlow versions of both the technical and aesthetic MobileNet models are provided, along with the script to generate them from the original Keras files, under the contrib/tf_serving
directory.
There is also an already configured TFS Dockerfile
that you can use.
To get predictions from the aesthetic or technical model:
docker build -t tfs_nima contrib/tf_serving
docker run -d --name tfs_nima -p 8500:8500 tfs_nima
virtualenv -p python3 contrib/tf_serving/venv_tfs_nima
source contrib/tf_serving/venv_tfs_nima/bin/activate
pip install -r contrib/tf_serving/requirements.txt
python -m contrib.tf_serving.tfs_sample_client --image-path src/tests/test_images/42039.jpg --model-name mobilenet_aesthetic
python -m contrib.tf_serving.tfs_sample_client --image-path src/tests/test_images/42039.jpg --model-name mobilenet_technical
Please cite Image Quality Assessment in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{idealods2018imagequalityassessment,
title={Image Quality Assessment},
author={Christopher Lennan and Hao Nguyen and Dat Tran},
year={2018},
howpublished={\url{https://github.com/idealo/image-quality-assessment}},
}
Author: idealo
Source Code: https://github.com/idealo/image-quality-assessment
License: Apache-2.0 license
#machinelearning #python #aws #computervision #deeplearning #tensorflow
1596826500
A simple yet comprehensive approach to the concepts
Convolutional Neural Networks
Artificial intelligence has seen a tremendous growth over the last few years, The gap between machines and humans is slowly but steadily decreasing. One important difference between humans and machines is (or rather was!) with regards to human’s perception of images and sound.How do we train a machine to recognize images and sound as we do?
At this point we can ask ourselves a few questions!!!
How would the machines perceive images and sound ?
How would the machines be able to differentiate between different images for example say between a cat and a dog?
Can machines identify and differentiate between different human beings for example lets say differentiate a male from a female or identify Leonardo Di Caprio or Brad Pitt by just feeding their images to it?
Let’s attempt to find out!!!
The Colour coding system:
Lets get a basic idea of what the colour coding system for machines is
RGB decimal system: It is denoted as rgb(255, 0, 0). It consists of three channels representing RED , BLUE and GREEN respectively . RGB defines how much red, green or blue value you’d like to have displayed in a decimal value somewhere between 0, which is no representation of the color, and 255, the highest possible concentration of the color. So, in the example rgb(255, 0, 0), we’d get a very bright red. If we wanted all green, our RGB would be rgb(0, 255, 0). For a simple blue, it would be rgb(0, 0, 255).As we know all colours can be obtained as a combination of Red , Green and Blue , we can obtain the coding for any colour we want.
Gray scale: Gray scale consists of just 1 channel (0 to 255)with 0 representing black and 255 representing white. The colors in between represent the different shades of Gray.
Computers ‘see’ in a different way than we do. Their world consists of only numbers.
Every image can be represented as 2-dimensional arrays of numbers, known as pixels.
But the fact that they perceive images in a different way, doesn’t mean we can’t train them to recognize patterns, like we do. We just have to think of what an image is in a different way.
Now that we have a basic idea of how images can be represented , let us try and understand The architecture of a CNN
CNN architecture
Convolutional Neural Networks have a different architecture than regular Neural Networks. Regular Neural Networks transform an input by putting it through a series of hidden layers. Every layer is made up of a set of neurons, where each layer is fully connected to all neurons in the layer before. Finally, there is a last fully-connected layer — the output layer — that represent the predictions.
Convolutional Neural Networks are a bit different. First of all, the layers are organised in 3 dimensions: width, height and depth. Further, the neurons in one layer do not connect to all the neurons in the next layer but only to a small region of it. Lastly, the final output will be reduced to a single vector of probability scores, organized along the depth dimension
A typical CNN architecture
As can be seen above CNNs have two components:
In this part, the network will perform a series of **convolutions **and pooling operations during which the features are detected. If you had a picture of a tiger , this is the part where the network would recognize the stripes , 4 legs , 2 eyes , one nose , distinctive orange colour etc.
Here, the fully connected layers will serve as a classifier on top of these extracted features. They will assign a** probability** for the object on the image being what the algorithm predicts it is.
Before we proceed any further we need to understand what is “convolution”, we will come back to the architecture later:
What do we mean by the “convolution” in Convolutional Neural Networks?
Let us decode!!!
#convolutional-neural-net #convolution #computer-vision #neural networks
1596825840
Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. RNN models are mostly used in the fields of natural language processing and speech recognition.
The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.
Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU.
1D Convolution_ layer_ creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. It is very effective for deriving features from a fixed-length segment of the overall dataset. A 1D CNN works well for natural language processing (NLP).
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as [_tf.data.Datasets_](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)
, enabling easy-to-use and high-performance input pipelines.
“imdb_reviews”
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import tensorflow as tf
import tensorflow_datasets
imdb, info=tensorflow_datasets.load("imdb_reviews", with_info=True, as_supervised=True)
imdb
info
train_data, test_data=imdb['train'], imdb['test']
training_sentences=[]
training_label=[]
testing_sentences=[]
testing_label=[]
for s,l in train_data:
training_sentences.append(str(s.numpy()))
training_label.append(l.numpy())
for s,l in test_data:
testing_sentences.append(str(s.numpy()))
testing_label.append(l.numpy())
training_label_final=np.array(training_label)
testing_label_final=np.array(testing_label)
vocab_size=10000
embedding_dim=16
max_length=120
trunc_type='post'
oov_tok='<oov>'
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer= Tokenizer(num_words=vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index=tokenizer.word_index
sequences=tokenizer.texts_to_sequences(training_sentences)
padded=pad_sequences(sequences, maxlen=max_length, truncating=trunc_type)
testing_sequences=tokenizer.texts_to_sequences(testing_sentences)
testing_padded=pad_sequences(testing_sequences, maxlen=max_length)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Embedding
#imdb #convolutional-network #long-short-term-memory #recurrent-neural-network #gated-recurrent-unit #neural networks
1595691960
In this post, we’re gonna take a close look at one of the well-known Graph neural networks named GCN. First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it.
Many problems are graphs in true nature. In our world, we see many data are graphs, such as molecules, social networks, and paper citations networks.
Examples of graphs. (Picture from [1])
In the graph, we have node features (the data of nodes) and the structure of the graph (how nodes are connected).
For the former, we can easily get the data from each node. But when it comes to the structure, it is not trivial to extract useful information from it. For example, if 2 nodes are close to one another, should we treat them differently to other pairs? How about high and low degree nodes? In fact, each specific task can consume a lot of time and effort just for Feature Engineering, i.e., to distill the structure into our features.
Feature engineering on graphs. (Picture from [1])
It would be much better to somehow get both the node features and the structure as the input, and let the machine to figure out what information is useful by itself.
That’s why we need Graph Representation Learning.
We want the graph can learn the “feature engineering” by itself. (Picture from [1])
Paper: Semi-supervised Classification with Graph Convolutional Networks(2017) [3]
GCN is a type of convolutional neural network that can work directly on graphs and take advantage of their structural information.
it solves the problem of classifying nodes (such as documents) in a graph (such as a citation network), where labels are only available for a small subset of nodes (semi-supervised learning).
Example of Semi-supervised learning on Graphs. Some nodes dont have labels (unknown nodes).
#graph-neural-networks #graph-convolution-network #deep-learning #neural-networks
1623135499
Neural networks have been around for a long time, being developed in the 1960s as a way to simulate neural activity for the development of artificial intelligence systems. However, since then they have developed into a useful analytical tool often used in replace of, or in conjunction with, standard statistical models such as regression or classification as they can be used to predict or more a specific output. The main difference, and advantage, in this regard is that neural networks make no initial assumptions as to the form of the relationship or distribution that underlies the data, meaning they can be more flexible and capture non-standard and non-linear relationships between input and output variables, making them incredibly valuable in todays data rich environment.
In this sense, their use has took over the past decade or so, with the fall in costs and increase in ability of general computing power, the rise of large datasets allowing these models to be trained, and the development of frameworks such as TensforFlow and Keras that have allowed people with sufficient hardware (in some cases this is no longer even an requirement through cloud computing), the correct data and an understanding of a given coding language to implement them. This article therefore seeks to be provide a no code introduction to their architecture and how they work so that their implementation and benefits can be better understood.
Firstly, the way these models work is that there is an input layer, one or more hidden layers and an output layer, each of which are connected by layers of synaptic weights¹. The input layer (X) is used to take in scaled values of the input, usually within a standardised range of 0–1. The hidden layers (Z) are then used to define the relationship between the input and output using weights and activation functions. The output layer (Y) then transforms the results from the hidden layers into the predicted values, often also scaled to be within 0–1. The synaptic weights (W) connecting these layers are used in model training to determine the weights assigned to each input and prediction in order to get the best model fit. Visually, this is represented as:
#machine-learning #python #neural-networks #tensorflow #neural-network-algorithm #no code introduction to neural networks
1601454900
I am not a deep learning researcher, but I’ve come to know a few things about neural networks through various exposures. I’ve always heard that CNN is a type of neural network that’s particularly good at image-related problems. But, what does that really mean? What’s with the word “convolutional”? What’s so unusual about an image-related problem that a different network is required?
Recently I had the opportunity to work on a COVID-19 image classification problem and built a CNN-based classifier using tensorflow.keras
that achieved an 85% accuracy rate. Finally, I think I’ve figured out the answers to those questions. Let me share with you those answers in a math-free way. If you are already familiar with CNNs, this post should feel like a good refresher. If not, take a look, you could gain an intuitive understanding of the motivation behind CNNs and the unique features that define a CNN.
#deep-learning #convolutional-network #data-science #machine-learning #neural-networks