How to Image Classification with TensorFlow 2.0?

Easy Image Classification with TensorFlow 2.0

Easy Image Classification with TensorFlow 2.0

Easy Image Classification with TensorFlow 2.0 ... Eager execution is enabled by default, without sacrificing the performance optimizations of graph-based execution. APIs are ... Tighter Keras integration as the high-level API.

Image Classification is one of the fundamental supervised tasks in the world of machine learning. TensorFlow’s new 2.0 version provides a totally new development ecosystem with Eager Execution enabled by default. By me, I assume most TF developers had a little hard time with TF 2.0 as we were habituated to use tf.Session and tf.placeholder that we can’t imagine TensorFlow without.

Today, we start with simple image classification without using TF Keras, so that we can take a look at the new API changes in TensorFlow 2.0

You can take a look at the Colab notebook for this story.

Trending AI Articles:

1. Machine Learning In Node.js With TensorFlow.js

2.Linear Regression using TensorFlow 2.0

3.Deep Learning Models with Tensorflow 2.0

Let’s import the data. For simplicity, use TensorFlow Datasets.

Data pipelines could be frustating ( Sometimes! ).

We need to play around with the low-level TF APIs rather than input pipelines. So, we import a well-designed dataset from TensorFlow Datasets directly. We will use the Horses Or Humans dataset.

img_classify_tf2.py

import tensorflow_datasets as tfds

dataset_name = 'horses_or_humans'

dataset = tfds.load( name=dataset_name , split=tfds.Split.TRAIN )
dataset = dataset.shuffle( 1024 ).batch( batch_size )

We can get a number of datasets readily available with TF Datasets.

Defining the model and related ops.

Remember what we needed for a CNN in Keras. Conv2D, MaxPooling2D, Flatten and Dense layers, right? We need to create these layers using the tf.nn module.

img_classify_tf2_1.py

leaky_relu_alpha = 0.2
dropout_rate = 0.5

def conv2d( inputs , filters , stride_size ):
    out = tf.nn.conv2d( inputs , filters , strides=[ 1 , stride_size , stride_size , 1 ] , padding=padding ) 
    return tf.nn.leaky_relu( out , alpha=leaky_relu_alpha ) 

def maxpool( inputs , pool_size , stride_size ):
    return tf.nn.max_pool2d( inputs , ksize=[ 1 , pool_size , pool_size , 1 ] , padding='VALID' , strides=[ 1 , stride_size , stride_size , 1 ] )

def dense( inputs , weights ):
    x = tf.nn.leaky_relu( tf.matmul( inputs , weights ) , alpha=leaky_relu_alpha )
    return tf.nn.dropout( x , rate=dropout_rate )

Also, we would require some weights. The shapes for our kernels ( filters ) need to be calculated.

img_classify_tf2_2.py

initializer = tf.initializers.glorot_uniform()
def get_weight( shape , name ):
    return tf.Variable( initializer( shape ) , name=name , trainable=True , dtype=tf.float32 )

shapes = [
    [ 3 , 3 , 3 , 16 ] , 
    [ 3 , 3 , 16 , 16 ] , 
    [ 3 , 3 , 16 , 32 ] , 
    [ 3 , 3 , 32 , 32 ] ,
    [ 3 , 3 , 32 , 64 ] , 
    [ 3 , 3 , 64 , 64 ] ,
    [ 3 , 3 , 64 , 128 ] , 
    [ 3 , 3 , 128 , 128 ] ,
    [ 3 , 3 , 128 , 256 ] , 
    [ 3 , 3 , 256 , 256 ] ,
    [ 3 , 3 , 256 , 512 ] , 
    [ 3 , 3 , 512 , 512 ] ,
    [ 8192 , 3600 ] , 
    [ 3600 , 2400 ] ,
    [ 2400 , 1600 ] , 
    [ 1600 , 800 ] ,
    [ 800 , 64 ] ,
    [ 64 , output_classes ] ,
]

weights = []
for i in range( len( shapes ) ):
    weights.append( get_weight( shapes[ i ] , 'weight{}'.format( i ) ) )

Note the trainable=True argument becomes necessary with tf.Variable. If not mentioned then we may receive an error regarding the differentiation of variables. In simpler words, a trainable variable is differentiable too.

Each weight is a tf.Variable with the trainable=True parameter which is important. Also, in TF 2.0, we get the tf.initializers module which makes it easier to initialize weights for neural networks. We need to encapsulate our weights in a weights array. This weights array will be used with the tf.optimizer.Adam for optimization.

Now, we assemble all the ops together to have a Keras-like model.

img_classify_tf2_3.py

def model( x ) :
    x = tf.cast( x , dtype=tf.float32 )
    c1 = conv2d( x , weights[ 0 ] , stride_size=1 ) 
    c1 = conv2d( c1 , weights[ 1 ] , stride_size=1 ) 
    p1 = maxpool( c1 , pool_size=2 , stride_size=2 )
    
    c2 = conv2d( p1 , weights[ 2 ] , stride_size=1 )
    c2 = conv2d( c2 , weights[ 3 ] , stride_size=1 ) 
    p2 = maxpool( c2 , pool_size=2 , stride_size=2 )
    
    c3 = conv2d( p2 , weights[ 4 ] , stride_size=1 ) 
    c3 = conv2d( c3 , weights[ 5 ] , stride_size=1 ) 
    p3 = maxpool( c3 , pool_size=2 , stride_size=2 )
    
    c4 = conv2d( p3 , weights[ 6 ] , stride_size=1 )
    c4 = conv2d( c4 , weights[ 7 ] , stride_size=1 )
    p4 = maxpool( c4 , pool_size=2 , stride_size=2 )

    c5 = conv2d( p4 , weights[ 8 ] , stride_size=1 )
    c5 = conv2d( c5 , weights[ 9 ] , stride_size=1 )
    p5 = maxpool( c5 , pool_size=2 , stride_size=2 )

    c6 = conv2d( p5 , weights[ 10 ] , stride_size=1 )
    c6 = conv2d( c6 , weights[ 11 ] , stride_size=1 )
    p6 = maxpool( c6 , pool_size=2 , stride_size=2 )

    flatten = tf.reshape( p6 , shape=( tf.shape( p6 )[0] , -1 ))

    d1 = dense( flatten , weights[ 12 ] )
    d2 = dense( d1 , weights[ 13 ] )
    d3 = dense( d2 , weights[ 14 ] )
    d4 = dense( d3 , weights[ 15 ] )
    d5 = dense( d4 , weights[ 16 ] )
    logits = tf.matmul( d5 , weights[ 17 ] )

    return tf.nn.softmax( logits )

Q. Why are declaring the model as a function? Later on, we will pass a batch of data to this function and get the outputs. We do not use Session as Eager execution is enabled by default. See this guide.

The loss function is easy.

def loss( pred , target ):
    return tf.losses.categorical_crossentropy( target , pred )

Next, comes the most confusing part for a beginner ( for me too! ). We will use tf.GradientTape for optimizing the model.

img_classify_tf2_4.py

optimizer = tf.optimizers.Adam( learning_rate )

def train_step( model, inputs , outputs ):
    with tf.GradientTape() as tape:
        current_loss = loss( model( inputs ), outputs)
    grads = tape.gradient( current_loss , weights )
    optimizer.apply_gradients( zip( grads , weights ) )
    print( tf.reduce_mean( current_loss ) )
    
 num_epochs = 256

for e in range( num_epochs ):
    for features in dataset:
        image , label = features[ 'image' ] , features[ 'label' ]
        train_step( model , image , tf.one_hot( label , depth=3 ) )

What’s happening here?

  1. We declare tf.GradientTape and within its scope, we call the model() and loss() methods in it. Hence, all the functions in these methods will be differentiated during backpropagation.
  2. We obtain the gradients using tape.gradient method.
  3. We optimize all the ops using the optimizer.apply_gradients method ( Earlier we used optimizer.minimize which is still available )

Read more about it from here.

How to Categorize TensorFlow.js Images Made easy

How to Categorize TensorFlow.js Images Made easy

TensorFlow.js Image Classification Made Easy In this video you're going to discover an easy way how to train a convolutional neural network for image classification and use the created TensorFlow.js image classifier afterwards to score x-ray images locally in your web browser.

TensorFlow.js Image Classification Made Easy
In this video you're going to discover an easy way how to train a convolutional neural network for image classification and use the created TensorFlow.js image classifier afterwards to score x-ray images locally in your web browser.

TensorFlow.JS is a great machine learning javascript-based framework to run your machine learning models locally in the web browser as well as on your server using node.js.
But defining your model structure and training it, is way more complex than just using a trained model.
Azure Custom Vision - one of the various Cognitive Services - offers you an easy way to avoid this hassle.

Build Your Own Image Classifier In Tensorflow

Build Your Own Image Classifier In Tensorflow

Building a Convolutional Neural Network for Image Classification with Tensorflow. Convolutional Neural Network (CNN) is a special type of deep neural network that performs impressively in computer vision problems such as image classification, object detection, etc.

Convolutional Neural Network (CNN) is a special type of deep neural network that performs impressively in computer vision problems such as image classification, object detection, etc. In this article, we are going to create an image classifier with Tensorflow by implementing a CNN to classify cats & dogs.

With traditional programming is it not possible to build scalable solutions for problems like computer vision since it is not feasible to write an algorithm that is generalized enough to identify the nature of images. With machine learning, we can build an approximation that is sufficient enough for use-cases by training a model for given examples and predict for unseen data.

How CNN work?

CNN is constructed with multiple convolution layers, pooling layers, and dense layers.

The idea of the convolution layer is to transform the input image in order to extract features (ex. ears, nose, legs of cats & dogs) to distinguish them correctly. This is done by convolving the image with a kernel. A kernel is specialized to extract certain features. It is possible to apply multiple kernels to a single image to capture multiple features.


How kernel is applied to an image to extract features

Usually, an activation function (ex. tanh, relu) will be applied to the convoluted values to increase the non-linearity.

The job of the pooling layer is to reduce the image size. It will only keep the most important features and remove the other area from the image. Moreover, this will reduce the computational cost as well. The most popular pooling strategies are max-pooling and average-pooling.

The size of the pooling matrix will determize the image reduction. Ex. 2x2 will reduce the image size by 50%

How max-pooling and average-pooling works

These series of convolution layers and pooling layers will help to identify the features and they will be followed by the dense layers for learning and prediction later.


Layers of a CNN

Building the Image Classifier

CNN is a deep neural network that needs much computation power for training. Moreover, to obtain sufficient accuracy there should be a large dataset to construct a generalized model for unseen data. Hence here I am running the code in Google Colab which is a platform for research purposes. Colab supports GPU enabled hardware which gives a huge boost for training as well.

Download and load the dataset

This dataset contains 2000 jpg images of cats and dogs. First, we need to download the dataset and extract it (Here data is downloaded to /tmp directory in Colab instance).


Downloading dataset


Extracting the dataset

The above code segments will download the datasets and extract them to /tmp directory. The extracted directory will have 2 subdirectories named train and validation. Those will have the training and testing data. Inside both those directories, there are 2 subdirectories for cats and dogs as well. We can easily load these training and testing data for the 2 classes with the TensorFlow data generator.

Setting the paths of testing and validation images


Load data with Ternsorflow image generator

Here we have 2 data generators for train and test data. When loading the data a rescaling is applied to normalize the pixel values for faster converging the model. Moreover, when loading the data we do it in 20 image batches and all of them are resized into 150x150 size. If there are images in different sizes this will fix it.

Constructing the model

Since the data is ready, now we can build up the model. Here I am going to add 3 convolutional layers followed by 3 max-pooling layers. Then there is a Flatten layer and finally, there are 2 dense layers.


Construct the CNN model

In the first convolution layer, I have added 16 kernels which have the size of 3x3. Once the image is convoluted with kernel it will be passed through relu activation to obtain non-linearity. The input shape of this layer should be 150x150 since we resized images for that size. Since all the images are colored images, they have 3 channels for RGB.

In the max-pooling layer, I have added a 2x2 kernel such that the max value will be taken when reducing the image size by 50%.

There are 3 such layers (convolution and max-pooling) to extract the features of images. If there are very complex features that need to be learned, more layers should be added to the model making it much deeper.

The Flatten layer will take the output from the previous max-pooling layer and convert it to a 1D array such that it can be feed into the Dense layers. A dense layer is a regular layer of neurons in a neural network. This is where the actual learning process happens by adjusting the weights. Here we have 2 such dense layers and since this is a binary classification there is only 1 neuron in the output layer. The number of neurons in the other layer can be adjusted as a hyperparameter to obtain the best accuracy.

Train the model

Since we have constructed the model, now we can compile it.


Compile the model

Here we need to define how to calculate the loss or error. Since we are using a binary classification we can use binary_crossentropy. With the optimizer parameter, we pass how to adjust the weights in the network such that the loss gets reduced. There are many options that can be used and here I use the RMSprop method. Finally, the metrics parameter will be used to estimate how good our model is and here we use the accuracy.

Now we can start training the model


Train the model

Here we are passing the train and validation generators we used to load our data. Since our data generator has 20 batch size we need to have 100 stps_per_epoch to cover all 2000 training images and 50 for validation images. The epochs parameter sets the number of iterations we conduct for training. The verbose parameter will show the progress in each iteration while training.

Results


Results after 15 epochs

After 15 epochs the model has scored 98.9% accuracy on training set and 71.5% accuracy on the validation set. This is a clear indication that our model has overfitted. Our model will perform really good in the training set and it will poorly perform for the unseen data.

To solve the overfitting problem either we can add regularization to avoid over-complexing the model or we can add more data to the training set to make the model more generalized for unseen data. Since we have a very small data set (2000 images) for training, adding more data should fix the issue.

Collecting more data to train a model is overwhelming in machine learning since it is required to preprocess the data again. But when working with images, especially in image classification, there is no need to collect more data. This can be fixed the technique called Image Augmentation.

Image Augmentation

The idea of Image Augmentation is to create more images by resizing, zooming, rotating images, etc to construct new images. With this approach, the model will able to capture more features than before and will able to generalize well for unseen data.

For example, let's assume most of the cats in our training set as follows which have the full body of a cat. The model will try to learn the shape of the body of the cat from these images.

Due to this, the classifier might fail to identify images like follow correctly since it hasn’t trained with examples similar to that.

But with image augmentation, we can construct new images from existing images to make the classifier learn new features. With the zoom feature in image augmentation, we can construct a new image like below to help the learner to classify images like above which failed to classify correctly before


Zoomed image from the original image with image augmentation

Adding image augmentation is really easy with the TensorFlow image generator. When image augmentation is applying, the original dataset will be untouched and all the manipulations will be done in the memory. The following code segment will show how to add this functionality.


Adding image augmentation when loading data

In here image rotating, shifting, zooming and few other image manipulation techniques are applied to generate new samples in the training set.

Once we apply the image augmentation it is possible to obtain 86% training accuracy and 81% testing accuracy. As you can see this model is not overfitted like before and with a very small dataset like this, this accuracy is impressive. Further, you can improve the accuracy by playing with the hyperparameters like the optimizer, the number of dense layers, number of neurons in each layer, etc.