Keras

Keras

Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, R, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.
Royce  Reinger

Royce Reinger

1674101160

Keras-tuner: A Hyperparameter Tuning Library for Keras

KerasTuner

KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily configure your search space with a define-by-run syntax, then leverage one of the available search algorithms to find the best hyperparameter values for your models. KerasTuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.

Installation

KerasTuner requires Python 3.7+ and TensorFlow 2.0+.

Install the latest release:

pip install keras-tuner --upgrade

You can also check out other versions in our GitHub repository.

Quick introduction

Import KerasTuner and TensorFlow:

import keras_tuner
from tensorflow import keras

Write a function that creates and returns a Keras model. Use the hp argument to define the hyperparameters during model creation.

def build_model(hp):
  model = keras.Sequential()
  model.add(keras.layers.Dense(
      hp.Choice('units', [8, 16, 32]),
      activation='relu'))
  model.add(keras.layers.Dense(1, activation='relu'))
  model.compile(loss='mse')
  return model

Initialize a tuner (here, RandomSearch). We use objective to specify the objective to select the best models, and we use max_trials to specify the number of different models to try.

tuner = keras_tuner.RandomSearch(
    build_model,
    objective='val_loss',
    max_trials=5)

Start the search and get the best model:

tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val))
best_model = tuner.get_best_models()[0]

To learn more about KerasTuner, check out this starter guide.

Community

Ask your questions on our GitHub Discussions.

Citing KerasTuner

If KerasTuner helps your research, we appreciate your citations. Here is the BibTeX entry:

@misc{omalley2019kerastuner,
    title        = {KerasTuner},
    author       = {O'Malley, Tom and Bursztein, Elie and Long, James and Chollet, Fran\c{c}ois and Jin, Haifeng and Invernizzi, Luca and others},
    year         = 2019,
    howpublished = {\url{https://github.com/keras-team/keras-tuner}}
}

Official Website: https://keras.io/keras_tuner/

Quick links

Download Details:

Author: keras-team
Source Code: https://github.com/keras-team/keras-tuner 
License: Apache-2.0 license

#machinelearning #deeplearning #tensorflow #keras 

Keras-tuner: A Hyperparameter Tuning Library for Keras
Royce  Reinger

Royce Reinger

1674045000

Deepjazz: Deep Learning Driven Jazz Generation using Keras & Theano!

Note: deepjazz is no longer being actively developed. It may be refactored at some point in the future. Goodbye and thank you for your interest 😢


deepjazz

Using Keras & Theano for deep learning driven jazz generation

I built deepjazz in 36 hours at a hackathon. It uses Keras & Theano, two deep learning libraries, to generate jazz music. Specifically, it builds a two-layer LSTM, learning from the given MIDI file. It uses deep learning, the AI tech that powers Google's AlphaGo and IBM's Watson, to make music -- something that's considered as deeply human.

SoundCloud
Check out deepjazz's music on SoundCloud!

Dependencies

Instructions

Run on CPU with command:

python generator.py [# of epochs]

Run on GPU with command:

THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python generator.py [# of epochs]

Note: running Keras/Theano on GPU is formally supported for only NVIDIA cards (CUDA backend).

Note: preprocess.py must be modified to work with other MIDI files (the relevant "melody" MIDI part needs to be selected). The ability to handle this natively is a planned feature.

Author

Ji-Sung Kim
Princeton University, Department of Computer Science
hello (at) jisungkim.com

Citations

This project develops a lot of preprocessing code (with permission) from Evan Chow's jazzml. Thank you Evan! Public examples from the Keras documentation were also referenced.

Code License, Media Copyright

Code is licensed under the Apache License 2.0
Images and other media are copyrighted (Ji-Sung Kim)

Download Details:

Author: jisungk
Source Code: https://github.com/jisungk/deepjazz 
License: Apache-2.0 license

#machinelearning #theano #music #deeplearning #keras 

Deepjazz: Deep Learning Driven Jazz Generation using Keras & Theano!

Explore Python Classes and Their Use in Keras

In this Keras article, we will learn about Python Classes and Their Use in Keras. Classes are one of the fundamental building blocks of the Python language, which may be applied in the development of machine learning applications. As we shall see, the Python syntax for developing classes is simple and can be applied to implement callbacks in Keras. 

in this tutorial, you will discover the Python classes and their functionality. 

After completing this tutorial, you will know:

  • Why Python classes are important
  • How to define and instantiate a class and set its attributes 
  • How to create methods and pass arguments
  • What is class inheritance
  • How to use classes to implement callbacks in Keras

Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

Tutorial Overview

This tutorial is divided into six parts; they are:

  • Introduction to Classes
  • Defining a Class
  • Instantiation and Attribute References
  • Creating Methods and Passing Arguments
  • Class Inheritance
  • Using Classes in Keras

Introduction to Classes

In object-oriented languages, such as Python, classes are one of the fundamental building blocks. 

They can be likened to blueprints for an object, as they define what properties and methods/behaviors an object should have.

Python Fundamentals, 2018.

Creating a new class creates a new object, where every class instance can be characterized by its attributes to maintain its state and methods to modify its state.

Defining a Class

The class keyword allows for the creation of a new class definition, immediately followed by the class name:

class MyClass:
    <statements>

In this manner, a new class object bound to the specified class name (MyClass, in this case) is created. Each class object can support instantiation and attribute references, as we will see shortly.

Instantiation and Attribute References

Instantiation is the creation of a new instance of a class.

To create a new instance of a class, we can call it using its class name and assign it to a variable. This will create a new, empty class object:

x = MyClass()

Upon creating a new instance of a class, Python calls its object constructor method, __init()__, which often takes arguments that are used to set the instantiated object’s attributes. 

We can define this constructor method in our class just like a function and specify attributes that will need to be passed in when instantiating an object.

Python Fundamentals, 2018.

Let’s say, for instance, that we would like to define a new class named Dog:


class Dog:
 family = "Canine"
 
 def __init__(self, name, breed):
 self.name = name
 self.breed = breed

Here, the constructor method takes two arguments, name and breed, which can be passed to it upon instantiating the object:

dog1 = Dog("Lassie", "Rough Collie")

In the example that we are considering, name and breed are known as instance variables (or attributes) because they are bound to a specific instance. This means that such attributes belong only to the object in which they have been set but not to any other object instantiated from the same class. 

On the other hand, family is a class variable (or attribute) because it is shared by all instances of the same class.

You may also note that the first argument of the constructor method (or any other method) is often called self. This argument refers to the object that we are in the process of creating. It is good practice to follow the convention of setting the first argument to self to ensure the readability of your code for other programmers. 

Once we have set our object’s attributes, they can be accessed using the dot operator. For example, considering again the dog1 instance of the Dog class, its name attribute may be accessed as follows:

print(dog1.name)

Producing the following output:

Lassie

Creating Methods and Passing Arguments

In addition to having a constructor method, a class object can also have several other methods for modifying its state. 

The syntax for defining an instance method is familiar. We pass the argument self … It is always the first argument of an instance method.

Python Fundamentals, 2018.

Similar to the constructor method, each instance method can take several arguments, with the first one being the argument self that lets us set and access the object’s attributes:


class Dog:
 family = "Canine"
 
 def __init__(self, name, breed):
 self.name = name
 self.breed = breed
 
 def info(self):
 print(self.name, "is a female", self.breed)

Different methods of the same object can also use the self argument to call each other:


class Dog:
 family = "Canine"
 
 def __init__(self, name, breed):
 self.name = name
 self.breed = breed
 self.tricks = []
 
 def add_tricks(self, x):
 self.tricks.append(x)
 
 def info(self, x):
 self.add_tricks(x)
 print(self.name, "is a female", self.breed, "that", self.tricks[0])

An output string can then be generated as follows:

dog1 = Dog("Lassie", "Rough Collie")
dog1.info("barks on command")

We find that, in doing so, the barks on command input is appended to the tricks list when the info() method calls the add_tricks() method. The following output is produced:

Lassie is a female Rough Collie that barks on command

Class Inheritance

Another feature that Python supports is class inheritance

Inheritance is a mechanism that allows a subclass (also known as a derived or child class) to access all attributes and methods of a superclass (also known as a base or parent class). 

The syntax for using a subclass is the following:

class SubClass(BaseClass):
    <statements>

It is also possible that a subclass inherits from multiple base classes, too. In this case, the syntax would be as follows:

class SubClass(BaseClass1, BaseClass2, BaseClass3):
    <statements>

Class attributes and methods are searched for in the base class and also in subsequent base classes in the case of multiple inheritances. 

Python further allows a method in a subclass to override another method in the base class that carries the same name. An overriding method in the subclass may be replacing the base class method or simply extending its capabilities. When an overriding subclass method is available, it is this method that is executed when called, rather than the method with the same name in the base class. 

Using Classes in Keras

A practical use of classes in Keras is to write one’s own callbacks. 

A callback is a powerful tool in Keras that allows us to look at our model’s behavior during the different stages of training, testing, and prediction. 

Indeed, we may pass a list of callbacks to any of the following:

  • keras.Model.fit()
  • keras.Model.evaluate()
  • keras.Model.predict()

The Keras API comes with several built-in callbacks. Nonetheless, we might wish to write our own, and for this purpose, we shall look at how to build a custom callback class. In order to do so, we can inherit several methods from the callback base class, which can provide us with information of when:

  • Training, testing, and prediction starts and ends
  • An epoch starts and ends
  • A training, testing, and prediction batch starts and ends

Let’s first consider a simple example of a custom callback that reports back every time that an epoch starts and ends. We will name this custom callback class, EpochCallback, and override the epoch-level methods, on_epoch_begin() and on_epoch_end(), from the base class, keras.callbacks.Callback:


import tensorflow.keras as keras
 
class EpochCallback(keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs=None):
        print("Starting epoch {}".format(epoch + 1))
 
    def on_epoch_end(self, epoch, logs=None):
        print("Finished epoch {}".format(epoch + 1))

In order to test the custom callback that we have just defined, we need a model to train. For this purpose, let’s define a simple Keras model:


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
 
def simple_model():
    model = Sequential()
    model.add(Flatten(input_shape=(28, 28)))
    model.add(Dense(128, activation="relu"))
    model.add(Dense(10, activation="softmax"))
 
    model.compile(loss="categorical_crossentropy",
                  optimizer="sgd",
                  metrics=["accuracy"])
    return model

We also need a dataset to train on, for which purpose we will be using the MNIST dataset:


from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
 
# Loading the MNIST training and testing data splits
(x_train, y_train), (x_test, y_test) = mnist.load_data()
 
# Pre-processing the training data
x_train = x_train / 255.0
x_train = x_train.reshape(60000, 28, 28, 1)
y_train_cat = to_categorical(y_train, 10)

Now, let’s try out the custom callback by adding it to the list of callbacks that we pass as input to the keras.Model.fit() method:


model = simple_model()
 
model.fit(x_train,
          y_train_cat,
          batch_size=32,
          epochs=5,
          callbacks=[EpochCallback()],
          verbose=0)

The callback that we have just created produces the following output:


Starting epoch 1
Finished epoch 1
Starting epoch 2
Finished epoch 2
Starting epoch 3
Finished epoch 3
Starting epoch 4
Finished epoch 4
Starting epoch 5
Finished epoch 5

We can create another custom callback that monitors the loss value at the end of each epoch and stores the model weights only if the loss has decreased. To this end, we will be reading the loss value from the log dict, which stores the metrics at the end of each batch and epoch. We will also be accessing the model corresponding to the current round of training, testing, or prediction, by means of self.model

Let’s call this custom callback, CheckpointCallback:

import numpy as np
 
class CheckpointCallback(keras.callbacks.Callback):
 
    def __init__(self):
        super(CheckpointCallback, self).__init__()
        self.best_weights = None
 
    def on_train_begin(self, logs=None):
        self.best_loss = np.Inf
 
    def on_epoch_end(self, epoch, logs=None):
        current_loss = logs.get("loss")
        print("Current loss is {}".format(current_loss))
        if np.less(current_loss, self.best_loss):
            self.best_loss = current_loss
            self.best_weights = self.model.get_weights()
            print("Storing the model weights at epoch {} \n".format(epoch + 1))

We can try this out again, this time including the CheckpointCallback into the list of callbacks:

model = simple_model()
 
model.fit(x_train,
          y_train_cat,
          batch_size=32,
          epochs=5,
          callbacks=[EpochCallback(), CheckpointCallback()],
          verbose=0)

The following output of the two callbacks together is now produced:


Starting epoch 1
Finished epoch 1
Current loss is 0.6327750086784363
Storing the model weights at epoch 1
 
Starting epoch 2
Finished epoch 2
Current loss is 0.3391888439655304
Storing the model weights at epoch 2
 
Starting epoch 3
Finished epoch 3
Current loss is 0.29216915369033813
Storing the model weights at epoch 3
 
Starting epoch 4
Finished epoch 4
Current loss is 0.2625095248222351
Storing the model weights at epoch 4
 
Starting epoch 5
Finished epoch 5
Current loss is 0.23906977474689484
Storing the model weights at epoch 5

Other classes in Keras

Besides callbacks, we can also make derived classes in Keras for custom metrics (derived from keras.metrics.Metrics), custom layers (derived from keras.layers.Layer), custom regularizer (derived from keras.regularizers.Regularizer), or even custom models (derived from keras.Model, for such as changing the behavior of invoking a model). All you have to do is follow the guideline to change the member functions of a class. You must use exactly the same name and parameters in the member functions.

Below is an example from Keras documentation:


class BinaryTruePositives(tf.keras.metrics.Metric):
 
  def __init__(self, name='binary_true_positives', **kwargs):
    super(BinaryTruePositives, self).__init__(name=name, **kwargs)
    self.true_positives = self.add_weight(name='tp', initializer='zeros')
 
  def update_state(self, y_true, y_pred, sample_weight=None):
    y_true = tf.cast(y_true, tf.bool)
    y_pred = tf.cast(y_pred, tf.bool)
 
    values = tf.logical_and(tf.equal(y_true, True), tf.equal(y_pred, True))
    values = tf.cast(values, self.dtype)
    if sample_weight is not None:
      sample_weight = tf.cast(sample_weight, self.dtype)
      values = tf.multiply(values, sample_weight)
    self.true_positives.assign_add(tf.reduce_sum(values))
 
  def result(self):
    return self.true_positives
 
  def reset_states(self):
    self.true_positives.assign(0)
 
m = BinaryTruePositives()
m.update_state([0, 1, 1, 1], [0, 1, 0, 0])
print('Intermediate result:', float(m.result()))
 
m.update_state([1, 1, 1, 1], [0, 1, 1, 0])
print('Final result:', float(m.result()))

This reveals why we would need a class for the custom metric: A metric is not just a function but a function that computes its value incrementally, once per batch of training data during the training cycle. Eventually, the result is reported at the result() function at the end of an epoch and reset its memory using the reset_state() function so you can start afresh in the next epoch.

For the details on what exactly has to be derived, you should refer to Keras’ documentation.

Original article sourced at: https://machinelearningmastery.com

#python #keras 

Explore Python Classes and Their Use in Keras

Learn About Data Augmentation with TensorFlow and Keras

In this TensorFlow and Keras article, we will Learn about data enhancement techniques, applications, and tools with a tutorial on TensorFlow and Keras. 

What is Data Augmentation?

Data augmentation is a technique of artificially increasing the training set by creating modified copies of a dataset using existing data. It includes making minor changes to the dataset or using deep learning to generate new data points.  

Augmented vs. Synthetic data

Augmented data is driven from original data with some minor changes. In the case of image augmentation, we make geometric and color space transformations (flipping, resizing, cropping, brightness, contrast) to increase the size and diversity of the training set. 

Synthetic data is generated artificially without using the original dataset. It often uses DNNs (Deep Neural Networks) and GANs (Generative Adversarial Networks) to generate synthetic data. 

Note: the augmentation techniques are not limited to images. You can augment audio, video, text, and other types of data too. 

When Should You Use Data Augmentation?  

  1. To prevent models from overfitting.
  2. The initial training set is too small.
  3. To improve the model accuracy.
  4. To Reduce the operational cost of labeling and cleaning the raw dataset. 

Limitations of Data Augmentation

  • The biases in the original dataset persist in the augmented data.
  • Quality assurance for data augmentation is expensive. 
  • Research and development are required to build a system with advanced applications. For example, generating high-resolution images using GANs can be challenging.
  • Finding an effective data augmentation approach can be challenging. 

Data Augmentation Techniques

In this section, we will learn about audio, text, image, and advanced data augmentation techniques. 

Audio Data Augmentation

  1. Noise injection: add gaussian or random noise to the audio dataset to improve the model performance. 
  2. Shifting: shift audio left (fast forward) or right with random seconds.
  3. Changing the speed: stretches times series by a fixed rate.
  4. Changing the pitch: randomly change the pitch of the audio. 

Text Data Augmentation

  1. Word or sentence shuffling: randomly changing the position of a word or sentence. 
  2. Word replacement: replace words with synonyms.
  3. Syntax-tree manipulation: paraphrase the sentence using the same word.
  4. Random word insertion: inserts words at random. 
  5. Random word deletion: deletes words at random. 

Image Augmentation

  1. Geometric transformations: randomly flip, crop, rotate, stretch, and zoom images. You need to be careful about applying multiple transformations on the same images, as this can reduce model performance. 
  2. Color space transformations: randomly change RGB color channels, contrast, and brightness.
  3. Kernel filters: randomly change the sharpness or blurring of the image. 
  4. Random erasing: delete some part of the initial image.
  5. Mixing images: blending and mixing multiple images. 

Advanced Techniques

  1. Generative adversarial networks (GANs): used to generate new data points or images. It does not require existing data to generate synthetic data. 
  2. Neural Style Transfer: a series of convolutional layers trained to deconstruct images and separate context and style.

Data Augmentation Applications

Data augmentation can apply to all machine learning applications where acquiring quality data is challenging. Furthermore, it can help improve model robustness and performance across all fields of study. 

Healthcare

Acquiring and labeling medical imaging datasets is time-consuming and expensive. You also need a subject matter expert to validate the dataset before performing data analysis. Using geometric and other transformations can help you train robust and accurate machine-learning models. 

For example, in the case of Pneumonia Classification, you can use random cropping, zooming, stretching, and color space transformation to improve the model performance. However, you need to be careful about certain augmentations as they can result in opposite results. For example, random rotation and reflection along the x-axis are not recommended for the X-ray imaging dataset. 

kaggle-COVID19-Classification.png

Image from ibrahimsobh.github.io | kaggle-COVID19-Classification

Self-Driving Cars

There is limited data available on self-driving cars, and companies are using simulated environments to generate synthetic data using reinforcement learning. It can help you train and test machine learning applications where data security is an issue. 

Autonomous Visualization System from Uber ATG.png

Image by David Silver | Autonomous Visualization System from Uber ATG

The possibilities of augmented data as a simulation are endless, as it can be used to generate real-world scenarios. 

Natural Language Processing

Text data augmentation is generally used in situations with limited quality data, and improving the performance metric takes priority. You can apply synonym augmentation, word embedding, character swap, and random insertion and deletion. These techniques are also valuable for low-resource languages. 

Selective Text Augmentation with Word Roles for Low-Resource Text Classification.png

Image from Papers With Code | Selective Text Augmentation with Word Roles for Low-Resource Text Classification.

Researchers use text augmentation for the language models in high error recognition scenarios, sequence-to-sequence data generation, and text classification. 

Automatic Speech Recognition

In sound classification and speech recognition, data augmentation works wonders. It improves the model performance even on low-resource languages. 

Noise Injection.png

Image by Edward Ma | Noise Injection

The random noise injection, shifting, and changing the pitch can help you produce state-of-the-art speech-to-text models. You can also use GANs to generate realistic sounds for a particular application.

Data Augmentation with Keras and TensorFlow

In this tutorial, we are going to learn how to augment image data using Keras and Tensorflow. Furthermore, you will learn how to use your augmented data to train a simple binary classifier. The code mentioned below is the modified version of TensorFlow’s official example

We recommend following the coding tutorial by practicing on your own. The code source with outputs is available on the DataCamp Workspace

Getting Started 

We will be using TensorFlow and  Keras for data augmentation and matplotlib for displaying the images.  

%%capture
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

from keras import layers
import keras

Data Loading

The TensorFlow Dataset collection is huge. You can find text, audio, video, graph, time-series, and image datasets. In this tutorial, we will be using the “cats_vs_dogs” dataset. The dataset size is 786.68 MiB, and we will apply various image augmentation and train the binary classifier.

In the code below, we have loaded 80% training, 10% validation, and a 10% test set with labels and metadata.

%%capture
(train_ds, val_ds, test_ds), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)

Data Analysis

There are two classes in the dataset ‘cat’ and ‘dog’.

num_classes = metadata.features['label'].num_classes
print(num_classes)
2

We will use iterators to extract only four random images with labels from the training set and display them using the matplotlib `.imshow()` function. 

get_label_name = metadata.features['label'].int2str
train_iter = iter(train_ds)
fig = plt.figure(figsize=(7, 8))
for x in range(4):
  image, label = next(train_iter)
  fig.add_subplot(1, 4, x+1)
  plt.imshow(image)
  plt.axis('off')
  plt.title(get_label_name(label));

As we can see, we got various dog images and a cat image. 

dogsandcats.png

Data Augmentation with Keras Sequential

We usually use keras.Sequential() to build the model, but we can also use it to add augmentation layers.  

Resize and rescale 

In the example, we are resizing and rescaling the image using Keras Sequential and image augmentation layers. We will first resize the image to 180X180 and then rescale it by 1/255. The small image size will help us save time, memory, and computing. 

As we can see, we have successfully passed the image through the augmentation layer, and the final output is resized and rescaled. 

IMG_SIZE = 180

resize_and_rescale = keras.Sequential([
  layers.Resizing(IMG_SIZE, IMG_SIZE),
  layers.Rescaling(1./255)
])

result = resize_and_rescale(image)
plt.axis('off')
plt.imshow(result);

cat.png

1

Random rotate and flip

Let’s apply random flip and rotation to the same image. We will use loop, subplot, and imshow to display six images with random geometric augmentation.

data_augmentation = keras.Sequential([
  layers.RandomFlip("horizontal_and_vertical"),
  layers.RandomRotation(0.4),
])


plt.figure(figsize=(8, 7))
for i in range(6):
  augmented_image = data_augmentation(image)
  ax = plt.subplot(2, 3, i + 1)
  plt.imshow(augmented_image.numpy()/255)
  plt.axis("off")

Note: if you are experiencing “WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).”, try to convert your image to numpy and divide it by 255. It will show you the clear output instead of a washed-out image. 

assortedcats.png

Apart from simple augmentation, you can also apply RandomContrast, RandomCrop, HeightCrop, WidthCrop, and RandomZoom to the images. 

Directly adding to the model layer 

There are two ways to apply augmentation to the images. The first method is by directly adding the augmentation layers to the model.

model = keras.Sequential([
  # Add the preprocessing layers you created earlier.
  resize_and_rescale,
  data_augmentation,
  # Add the model layers
  layers.Conv2D(16, 3, padding='same', activation='relu'),
  layers.MaxPooling2D(),
  layers.Flatten(),
  layers.Dense(128, activation='relu'),
  layers.Dense(64, activation='relu'),
  layers.Dense(1,activation='sigmoid')
])

Note: the data augmentation is inactive during the testing phase. It will only work for Model.fit, not for Model.evaluate or Model.predict.

Applying the augmentation function using .map

The second method is to apply the data augmentation to the entire train set using Dataset.map.

aug_ds = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y))

Data pre-processing 

We will create a data preprocessing function to process train, valid, and test sets. 

The function will:

  1. Apply resize and rescale to the entire dataset.
  2. If shuffle is True, it will shuffle the dataset.
  3. Convert the data into batches using 32 batch size. 
  4. If the augment is True, it will apply the data argumentation function on all datasets. 
  5. Finally, use Dataset.prefetch to overlap the training of your model on the GPU with data processing.
batch_size = 32
AUTOTUNE = tf.data.AUTOTUNE

def prepare(ds, shuffle=False, augment=False):
  # Resize and rescale all datasets.
  ds = ds.map(lambda x, y: (resize_and_rescale(x), y),
              num_parallel_calls=AUTOTUNE)

  if shuffle:
    ds = ds.shuffle(1000)

  # Batch all datasets.
  ds = ds.batch(batch_size)

  # Use data augmentation only on the training set.
  if augment:
    ds = ds.map(lambda x, y: (data_augmentation(x, training=True), y),
                num_parallel_calls=AUTOTUNE)

  # Use buffered prefetching on all datasets.
  return ds.prefetch(buffer_size=AUTOTUNE)


train_ds = prepare(train_ds, shuffle=True, augment=True)
val_ds = prepare(val_ds)
test_ds = prepare(test_ds)

Model building

We will create a simple model with convolution and dense layers. Make sure the input shape is similar to the image shape. 

model = keras.Sequential([
    layers.Conv2D(32, (3, 3), input_shape=(180,180,3), padding='same', activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation='relu'),
    layers.Dense(1,activation='softmax')
])

Training and evaluation

We will now compile the model and train it for one epoch. The optimizer is Adam, the loss function is Binary Cross Entropy, and the metric is accuracy. 

As we can observe, we got 51% validation accuracy on the single run. You can train it for multiple epochs and optimize hyper-parameters to get even better results.

The model building and training part is just to give you an idea of how you can augment the images and train the model.  

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
epochs=1
history = model.fit(
  train_ds,
  validation_data=val_ds,
  epochs=epochs
)
582/582 [==============================] - 98s 147ms/step - loss: 0.6993 - accuracy: 0.4961 - val_loss: 0.6934 - val_accuracy: 0.5185
loss, acc = model.evaluate(test_ds)
73/73 [==============================] - 4s 48ms/step - loss: 0.6932 - accuracy: 0.5013

Learn to conduct image analysis, and construct, train, and evaluate convolution networks by taking the Image Processing with Keras course. 

Data Augmentation using tf.image

In this section, we will learn to augment images using TensorFlow to have finer control of data augmentation.

Data Loading

We will load the cats_vs_dogs dataset again with labels and metadata.

%%capture
(train_ds, val_ds, test_ds), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)

Instead of a cat image, we will be using the dog image and applying various augmentation techniques. 

image, label = next(iter(train_ds))
plt.imshow(image)
plt.title(get_label_name(label));

dog.png

1

Flip left to right

We will create the visualize function to display the difference between the original and augmented image. 

The function is pretty straightforward. It takes the original image and the augmentation function as input and displays the difference using matplotlib.

def visualize(original, augmented):
    fig = plt.figure()
    plt.subplot(1,2,1)
    plt.title('Original image')
    plt.imshow(original)
    plt.axis("off")
 
    plt.subplot(1,2,2)
    plt.title('Augmented image')
    plt.imshow(augmented)
    plt.axis("off")

As we can see, we have flipped the image from left to right using the tf.image function. It is much simpler than keras.Sequential. 

flipped = tf.image.flip_left_right(image)
visualize(image, flipped)

dogflipped.png

1

Grayscale

Let’s convert the image to grayscale using `tf.image.rgb_to_grayscale`.

grayscaled = tf.image.rgb_to_grayscale(image)
visualize(image,  tf.squeeze(grayscaled))

doggreyscale.png

1

Adjusting the saturation

You can also adjust saturation by a factor of 3. 

saturated = tf.image.adjust_saturation(image, 3)
visualize(image, saturated)

dogsaturation.png

1

Adjusting the brightness

Adjust the brightness by providing a brightness factor. 

bright = tf.image.adjust_brightness(image, 0.4)
visualize(image, bright)

dogbrightness.png

1

Central Crop

Crop the image from the center using a central fraction of 0.5. 

cropped = tf.image.central_crop(image, central_fraction=0.5)
visualize(image, cropped)

dogzoom.png

1

90-degree rotation

Rotate the image to 90 degrees using the `tf.image.rot90` function.

rotated = tf.image.rot90(image)
visualize(image, rotated)

dogrotate.png

1

Applying random brightness

Just like Keras layers, tf.image also has random augmentation functions. In the example below, we will apply the random brightness to the image and display multiple results. 

As we can see, the first image is a bit darker, and the next two images are brighter. 

for i in range(3):
  seed = (i, 0)  # tuple of size (2,)
  stateless_random_brightness = tf.image.stateless_random_brightness(
      image, max_delta=0.95, seed=seed)
  visualize(image, stateless_random_brightness)

dogdark.png

dogbrightness2.png

dogbrightness3.png

Applying the augmentation function

Just like keras, we can apply a data augmentation function to the entire dataset using Dataset.map. 

def augment(image, label):
  image = tf.cast(image, tf.float32)
  image = tf.image.resize(image, [IMG_SIZE, IMG_SIZE])
  image = (image / 255.0)
  image = tf.image.random_crop(image, size=[IMG_SIZE, IMG_SIZE, 3])
  image = tf.image.random_brightness(image, max_delta=0.5)
  return image, label


train_ds = (
    train_ds
    .shuffle(1000)
    .map(augment, num_parallel_calls=AUTOTUNE)
    .batch(batch_size)
    .prefetch(AUTOTUNE)
)

Data Augmentation with ImageDataGenerator

The Keras ImageDataGenerator is even simpler. It works best when you are loading data from a local directory or CSV. 

In the example, we will download and load a small CIFAR10 dataset from Keras default dataset library. 

After that, we will apply augmentation using `keras.preprocessing.image.ImageDataGenerator`. The function will randomly rotate, change the height and width, and horizontally flip the images. 

Finally, we will fit ImageDataGenerator to the training dataset and display six images with random augmentation. 

Note: the image size is 32x32, so we have a low-resolution display. 

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()


datagen = keras.preprocessing.image.ImageDataGenerator(rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    validation_split=0.2)

datagen.fit(x_train)

for X_batch, y_batch in datagen.flow(x_train,y_train, batch_size=6):
    for i in range(0, 6):
        plt.subplot(2,3,i+1)
        plt.imshow(X_batch[i]/255)
        plt.axis('off')
    break

lowres.png

Data Augmentation Tools

In this section, we will learn about other open-source tools that you can use to perform various data augmentation techniques and improve the model performance. 

Pytorch

Image transformation is available in the torchvision.transforms module. Similar to Keras, you can add transform layers within torch.nn.Sequential or apply an augmentation function separately on the dataset. 

Augmentor

Augmentor is a Python package for image augmentation and artificial image generation. You can perform Perspective Skewing, Elastic Distortions, Rotating, Shearing, Cropping, and Mirroring. Augmentor also comes with basic image pre-processing functionality.

Albumentations

Albumentations is a fast and flexible Python tool for image augmentation. It is widely used in machine learning competitions, industry, and research to improve the performance of deep convolutional neural networks. 

Imgaug

Imgaug is an open-source tool for image augmentation. It supports a wide variety of augmentation techniques, such as Gaussian noise, contrast, sharpness, crop, affine, and flip. It has a simple yet powerful stochastic interface, and it comes with keypoints, bounding boxes, heatmaps, and segmentation maps.

OpenCV

OpenCV is a massive open-source library for computer vision, machine learning, and image processing. It is generally used in building real-time applications. You can use OpenCV to augment images and videos hassle-free. 

Conclusion

Image augmentation functions provided by Tensorflow and Keras are convenient. You just have to add an augmentation layer, tf.image, or ImageDataGenerator to perform augmentation. Apart from deep learning frameworks, you can use standalone tools such as Augmentor, Albumentations, OpenCV, and Imgaug to perform data augmentation.`

Original article sourced at: https://www.datacamp.com

#keras  #tensorflow 

Learn About Data Augmentation with TensorFlow and Keras
Paris  Kessler

Paris Kessler

1670827598

How to Build Your First Deep Learning Project in Python with Keras

In this Python post we will learn How to build your first Deep Learning Project in Python with Keras. Keras is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models.

It is part of the TensorFlow library and allows you to define and train neural network models in just a few lines of code.

In this tutorial, you will discover how to create your first deep learning neural network model in Python using Keras.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Keras Tutorial Overview

There is not a lot of code required, but we will go over it slowly so that you will know how to create your own models in the future.

The steps you will learn in this tutorial are as follows:

  1. Load Data
  2. Define Keras Model
  3. Compile Keras Model
  4. Fit Keras Model
  5. Evaluate Keras Model
  6. Tie It All Together
  7. Make Predictions

This Keras tutorial makes a few assumptions. You will need to have:

  1. Python 2 or 3 installed and configured
  2. SciPy (including NumPy) installed and configured
  3. Keras and a backend (Theano or TensorFlow) installed and configured

If you need help with your environment, see the tutorial:

Create a new file called keras_first_network.py and type or copy-and-paste the code into the file as you go.

1. Load Data

The first step is to define the functions and classes you intend to use in this tutorial.

You will use the NumPy library to load your dataset and two classes from the Keras library to define your model.

The imports required are listed below.


# first neural network with keras tutorial
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
...

You can now load our dataset.

In this Keras tutorial, you will use the Pima Indians onset of diabetes dataset. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

As such, it is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values and is an ideal choice for our first neural network in Keras.

The dataset is available here:

Download the dataset and place it in your local working directory, the same location as your Python file.

Save it with the filename:

pima-indians-diabetes.csv

Take a look inside the file; you should see rows of data like the following:


6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
...

You can now load the file as a matrix of numbers using the NumPy function loadtxt().

There are eight input variables and one output variable (the last column). You will be learning a model to map rows of input variables (X) to an output variable (y), which is often summarized as y = f(X).

The variables can be summarized as follows:

Input Variables (X):

  1. Number of times pregnant
  2. Plasma glucose concentration at 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age (years)

Output Variables (y):

  1. Class variable (0 or 1)

Once the CSV file is loaded into memory, you can split the columns of data into input and output variables.

The data will be stored in a 2D array where the first dimension is rows and the second dimension is columns, e.g., [rows, columns].

You can split the array into two arrays by selecting subsets of columns using the standard NumPy slice operator or “:”. You can select the first eight columns from index 0 to index 7 via the slice 0:8. We can then select the output column (the 9th variable) via index 8.

...
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
...

You are now ready to define your neural network model.

Note: The dataset has nine columns, and the range 0:8 will select columns from 0 to 7, stopping before index 8. If this is new to you, then you can learn more about array slicing and ranges in this post:

2. Define Keras Model

Models in Keras are defined as a sequence of layers.

We create a Sequential model and add layers one at a time until we are happy with our network architecture.

The first thing to get right is to ensure the input layer has the correct number of input features. This can be specified when creating the first layer with the input_shape argument and setting it to (8,) for presenting the eight input variables as a vector.

How do we know the number of layers and their types?

This is a tricky question. There are heuristics that you can use, and often the best network structure is found through a process of trial and error experimentation (I explain more about this here). Generally, you need a network large enough to capture the structure of the problem.

In this example, let’s use a fully-connected network structure with three layers.

Fully connected layers are defined using the Dense class. You can specify the number of neurons or nodes in the layer as the first argument and the activation function using the activation argument.

Also, you will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.

It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function. Using a sigmoid on the output layer ensures your network output is between 0 and 1 and is easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.

You can piece it all together by adding each layer:

  • The model expects rows of data with 8 variables (the input_shape=(8,) argument).
  • The first hidden layer has 12 nodes and uses the relu activation function.
  • The second hidden layer has 8 nodes and uses the relu activation function.
  • The output layer has one node and uses the sigmoid activation function.
...
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
...

Note:  The most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer. This means that the line of code that adds the first Dense layer is doing two things, defining the input or visible layer and the first hidden layer.

3. Compile Keras Model

Now that the model is defined, you can compile it.

Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU, GPU, or even distributed.

When compiling, you must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in your dataset.

You must specify the loss function to use to evaluate a set of weights, the optimizer used to search through different weights for the network, and any optional metrics you want to collect and report during training.

In this case, use cross entropy as the loss argument. This loss is for a binary classification problems and is defined in Keras as “binary_crossentropy“. You can learn more about choosing loss functions based on your problem here:

We will define the optimizer as the efficient stochastic gradient descent algorithm “adam“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. To learn more about the Adam version of stochastic gradient descent, see the post:

Finally, because it is a classification problem, you will collect and report the classification accuracy defined via the metrics argument.

...
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
...

4. Fit Keras Model

You have defined your model and compiled it to get ready for efficient computation.

Now it is time to execute the model on some data.

You can train or fit your model on your loaded data by calling the fit() function on the model.

Training occurs over epochs, and each epoch is split into batches.

  • Epoch: One pass through all of the rows in the training dataset
  • Batch: One or more samples considered by the model within an epoch before weights are updated

One epoch comprises one or more batches, based on the chosen batch size, and the model is fit for many epochs. For more on the difference between epochs and batches, see the post:

The training process will run for a fixed number of epochs (iterations) through the dataset that you must specify using the epochs argument. You must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size, and set using the batch_size argument.

This problem will run for a small number of epochs (150) and use a relatively small batch size of 10.

These configurations can be chosen experimentally by trial and error. You want to train the model enough so that it learns a good (or good enough) mapping of rows of input data to the output classification. The model will always have some error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.

...
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
...

This is where the work happens on your CPU or GPU.

No GPU is required for this example, but if you’re interested in how to run large models on GPU hardware cheaply in the cloud, see this post:

5. Evaluate Keras Model

You have trained our neural network on the entire dataset, and you can evaluate the performance of the network on the same dataset.

This will only give you an idea of how well you have modeled the dataset (e.g., train accuracy), but no idea of how well the algorithm might perform on new data. This was done for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.

You can evaluate your model on your training dataset using the evaluate() function and pass it the same input and output used to train the model.

This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.

The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset, and the second will be the accuracy of the model on the dataset. You are only interested in reporting the accuracy so ignore the loss value.

...
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

6. Tie It All Together

You have just seen how you can easily create your first neural network model in Keras.

Let’s tie it all together into a complete code example.

# first neural network with keras tutorial
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

You can copy all the code into your Python file and save it as “keras_first_network.py” in the same directory as your data file “pima-indians-diabetes.csv“. You can then run the Python file as a script from your command line (command prompt) as follows:

python keras_first_network.py

Running this example, you should see a message for each of the 150 epochs, printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.

It takes about 10 seconds to execute on my workstation running on the CPU.

Ideally, you would like the loss to go to zero and the accuracy to go to 1.0 (e.g., 100%). This is not possible for any but the most trivial machine learning problems. Instead, you will always have some error in your model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest accuracy possible for a given dataset.

...
768/768 [==============================] - 0s 63us/step - loss: 0.4817 - acc: 0.7708
Epoch 147/150
768/768 [==============================] - 0s 63us/step - loss: 0.4764 - acc: 0.7747
Epoch 148/150
768/768 [==============================] - 0s 63us/step - loss: 0.4737 - acc: 0.7682
Epoch 149/150
768/768 [==============================] - 0s 64us/step - loss: 0.4730 - acc: 0.7747
Epoch 150/150
768/768 [==============================] - 0s 63us/step - loss: 0.4754 - acc: 0.7799
768/768 [==============================] - 0s 38us/step
Accuracy: 76.56

Note: If you try running this example in an IPython or Jupyter notebook, you may get an error.

The reason is the output progress bars during training. You can easily turn these off by setting verbose=0 in the call to the fit() and evaluate() functions; for example:

...
# fit the keras model on the dataset without progress bars
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# evaluate the keras model
_, accuracy = model.evaluate(X, y, verbose=0)
...

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

 

What score did you get?
Post your results in the comments below.

Neural networks are stochastic algorithms, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. This is a feature, not a bug. You can learn more about this in the post:

The variance in the performance of the model means that to get a reasonable approximation of how well your model is performing, you may need to fit it many times and calculate the average of the accuracy scores. For more on this approach to evaluating neural networks, see the post:

For example, below are the accuracy scores from re-running the example five times:

Accuracy: 75.00
Accuracy: 77.73
Accuracy: 77.60
Accuracy: 78.12
Accuracy: 76.17

You can see that all accuracy scores are around 77%, and the average is 76.924%.

7. Make Predictions

The number one question I get asked is:

“After I train my model, how can I use it to make predictions on new data?”

Great question.

You can adapt the above example and use it to generate predictions on the training dataset, pretending it is a new dataset you have not seen before.

Making predictions is as easy as calling the predict() function on the model. You are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. You can easily convert them into a crisp binary prediction for this classification task by rounding them.

For example:

...
# make probability predictions with the model
predictions = model.predict(X)
# round predictions 
rounded = [round(x[0]) for x in predictions]

Alternately, you can convert the probability into 0 or 1 to predict crisp classes directly; for example:

...
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)

The complete example below makes predictions for each example in the dataset, then prints the input data, predicted class, and expected class for the first five examples in the dataset.

# first neural network with keras make predictions
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
 print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))

Running the example does not show the progress bar as before, as the verbose argument has been set to 0.

After the model is fit, predictions are made for all examples in the dataset, and the input rows and predicted class value for the first five examples is printed and compared to the expected class value.

You can see that most rows are correctly predicted. In fact, you can expect about 76.9% of the rows to be correctly predicted based on your estimated performance of the model in the previous section.


[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)

If you would like to know more about how to make predictions with Keras models, see the post:

Keras Tutorial Summary

In this post, you discovered how to create your first neural network model using the powerful Keras Python library for deep learning.

Specifically, you learned the six key steps in using Keras to create a neural network or deep learning model step-by-step, including:

  1. How to load data
  2. How to define a neural network in Keras
  3. How to compile a Keras model using the efficient numerical backend
  4. How to train a model on data
  5. How to evaluate a model on data
  6. How to make predictions with the model

Do you have any questions about Keras or about this tutorial?
Ask your question in the comments, and I will do my best to answer.

Keras Tutorial Extensions

Well done, you have successfully developed your first neural network using the Keras deep learning library in Python.

This section provides some extensions to this tutorial that you might want to explore.

  • Tune the Model. Change the configuration of the model or training process and see if you can improve the performance of the model, e.g., achieve better than 76% accuracy.
  • Save the Model. Update the tutorial to save the model to a file, then load it later and use it to make predictions (see this tutorial).
  • Summarize the Model. Update the tutorial to summarize the model and create a plot of model layers (see this tutorial).
  • Separate, Train, and Test Datasets. Split the loaded dataset into a training and test set (split based on rows) and use one set to train the model and the other set to estimate the performance of the model on new data.
  • Plot Learning Curves. The fit() function returns a history object that summarizes the loss and accuracy at the end of each epoch. Create line plots of this data, called learning curves (see this tutorial).
  • Learn a New Dataset. Update the tutorial to use a different tabular dataset, perhaps from the UCI Machine Learning Repository.
  • Use Functional API. Update the tutorial to use the Keras Functional API for defining the model (see this tutorial).

Original article sourced at: https://machinelearningmastery.com

#python  #keras #deep-learning 

How to Build Your First Deep Learning Project in Python with Keras

Save and Load Keras Models

In this Keras article, we will learn about How to Save and Load Your Keras Deep Learning Model. Keras is a simple and powerful Python library for deep learning.

Since deep learning models can take hours, days, and even weeks to train, it is important to know how to save and load them from a disk.

In this post, you will discover how to save your Keras models to files and load them up again to make predictions.

After reading this tutorial, you will know:

  • How to save model weights and model architecture in separate files
  • How to save model architecture in both YAML and JSON format
  • How to save model weights and architecture into a single file for later use

Tutorial Overview

If you are new to Keras or deep learning, see this step-by-step Keras tutorial.

Keras separates the concerns of saving your model architecture and saving your model weights.

Model weights are saved to an HDF5 format. This grid format is ideal for storing multi-dimensional arrays of numbers.

The model structure can be described and saved using two different formats: JSON and YAML.

In this post, you will look at three examples of saving and loading your model to a file:

  • Save Model to JSON
  • Save Model to YAML
  • Save Model to HDF5

The first two examples save the model architecture and weights separately. The model weights are saved into an HDF5 format file in all cases.

The examples will use the same simple network trained on the Pima Indians onset of diabetes binary classification dataset. This is a small dataset that contains all numerical data and is easy to work with. You can download this dataset and place it in your working directory with the filename “pima-indians-diabetes.csv” (update: download from here).

Confirm that you have TensorFlow v2.x installed (e.g., v2.9 as of June 2022).

Note: Saving models requires that you have the h5py library installed. It is usually installed as a dependency with TensorFlow. You can also install it easily as follows:

sudo pip install h5py

Save Your Neural Network Model to JSON

JSON is a simple file format for describing data hierarchically.

Keras provides the ability to describe any model using JSON format with a to_json() function. This can be saved to a file and later loaded via the model_from_json() function that will create a new model from the JSON specification.

The weights are saved directly from the model using the save_weights() function and later loaded using the symmetrical load_weights() function.

The example below trains and evaluates a simple model on the Pima Indians dataset. The model is then converted to JSON format and written to model.json in the local directory. The network weights are written to model.h5 in the local directory.

The model and weight data is loaded from the saved files, and a new model is created. It is important to compile the loaded model before it is used. This is so that predictions made using the model can use the appropriate efficient computation from the Keras backend.

The model is evaluated in the same way, printing the same evaluation score.


# MLP for Pima Indians Dataset Serialize to JSON and HDF5
from tensorflow.keras.models import Sequential, model_from_json
from tensorflow.keras.layers import Dense
import numpy
import os
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
 
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
 
# later...
 
# load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
 
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this example provides the output below.


acc: 78.78%
Saved model to disk
Loaded model from disk
acc: 78.78%

The JSON format of the model looks like the following:


{  
   "class_name":"Sequential",
   "config":{  
      "name":"sequential_1",
      "layers":[  
         {  
            "class_name":"Dense",
            "config":{  
               "name":"dense_1",
               "trainable":true,
               "batch_input_shape":[  
                  null,
                  8
               ],
               "dtype":"float32",
               "units":12,
               "activation":"relu",
               "use_bias":true,
               "kernel_initializer":{  
                  "class_name":"VarianceScaling",
                  "config":{  
                     "scale":1.0,
                     "mode":"fan_avg",
                     "distribution":"uniform",
                     "seed":null
                  }
               },
               "bias_initializer":{  
                  "class_name":"Zeros",
                  "config":{  
 
                  }
               },
               "kernel_regularizer":null,
               "bias_regularizer":null,
               "activity_regularizer":null,
               "kernel_constraint":null,
               "bias_constraint":null
            }
         },
         {  
            "class_name":"Dense",
            "config":{  
               "name":"dense_2",
               "trainable":true,
               "dtype":"float32",
               "units":8,
               "activation":"relu",
               "use_bias":true,
               "kernel_initializer":{  
                  "class_name":"VarianceScaling",
                  "config":{  
                     "scale":1.0,
                     "mode":"fan_avg",
                     "distribution":"uniform",
                     "seed":null
                  }
               },
               "bias_initializer":{  
                  "class_name":"Zeros",
                  "config":{  
 
                  }
               },
               "kernel_regularizer":null,
               "bias_regularizer":null,
               "activity_regularizer":null,
               "kernel_constraint":null,
               "bias_constraint":null
            }
         },
         {  
            "class_name":"Dense",
            "config":{  
               "name":"dense_3",
               "trainable":true,
               "dtype":"float32",
               "units":1,
               "activation":"sigmoid",
               "use_bias":true,
               "kernel_initializer":{  
                  "class_name":"VarianceScaling",
                  "config":{  
                     "scale":1.0,
                     "mode":"fan_avg",
                     "distribution":"uniform",
                     "seed":null
                  }
               },
               "bias_initializer":{  
                  "class_name":"Zeros",
                  "config":{  
 
                  }
               },
               "kernel_regularizer":null,
               "bias_regularizer":null,
               "activity_regularizer":null,
               "kernel_constraint":null,
               "bias_constraint":null
            }
         }
      ]
   },
   "keras_version":"2.2.5",
   "backend":"tensorflow"
}

Save Your Neural Network Model to YAML

Note: This method only applies to TensorFlow 2.5 or earlier. If you run it in later versions of TensorFlow, you will see a RuntimeError with the message “Method model.to_yaml() has been removed due to security risk of arbitrary code execution. Please use model.to_json() instead.”

This example is much the same as the above JSON example, except the YAML format is used for the model specification.

Note, this example assumes that you have PyYAML 5 installed:

sudo pip install PyYAML

In this example, the model is described using YAML, saved to file model.yaml, and later loaded into a new model via the model_from_yaml() function.

Weights are handled the same way as above in the HDF5 format as model.h5.


# MLP for Pima Indians Dataset serialize to YAML and HDF5
from tensorflow.keras.models import Sequential, model_from_yaml
from tensorflow.keras.layers import Dense
import numpy
import os
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
 
# serialize model to YAML
model_yaml = model.to_yaml()
with open("model.yaml", "w") as yaml_file:
    yaml_file.write(model_yaml)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
 
# later...
 
# load YAML and create model
yaml_file = open('model.yaml', 'r')
loaded_model_yaml = yaml_file.read()
yaml_file.close()
loaded_model = model_from_yaml(loaded_model_yaml)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
 
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example displays the following output.


acc: 78.78%
Saved model to disk
Loaded model from disk
acc: 78.78%

The model described in YAML format looks like the following:


backend: tensorflow
class_name: Sequential
config:
  layers:
  - class_name: Dense
    config:
      activation: relu
      activity_regularizer: null
      batch_input_shape: !!python/tuple
      - null
      - 8
      bias_constraint: null
      bias_initializer:
        class_name: Zeros
        config: {}
      bias_regularizer: null
      dtype: float32
      kernel_constraint: null
      kernel_initializer:
        class_name: VarianceScaling
        config:
          distribution: uniform
          mode: fan_avg
          scale: 1.0
          seed: null
      kernel_regularizer: null
      name: dense_1
      trainable: true
      units: 12
      use_bias: true
  - class_name: Dense
    config:
      activation: relu
      activity_regularizer: null
      bias_constraint: null
      bias_initializer:
        class_name: Zeros
        config: {}
      bias_regularizer: null
      dtype: float32
      kernel_constraint: null
      kernel_initializer:
        class_name: VarianceScaling
        config:
          distribution: uniform
          mode: fan_avg
          scale: 1.0
          seed: null
      kernel_regularizer: null
      name: dense_2
      trainable: true
      units: 8
      use_bias: true
  - class_name: Dense
    config:
      activation: sigmoid
      activity_regularizer: null
      bias_constraint: null
      bias_initializer:
        class_name: Zeros
        config: {}
      bias_regularizer: null
      dtype: float32
      kernel_constraint: null
      kernel_initializer:
        class_name: VarianceScaling
        config:
          distribution: uniform
          mode: fan_avg
          scale: 1.0
          seed: null
      kernel_regularizer: null
      name: dense_3
      trainable: true
      units: 1
      use_bias: true
  name: sequential_1
keras_version: 2.2.5

Save Model Weights and Architecture Together

Keras also supports a simpler interface to save both the model weights and model architecture together into a single H5 file.

Saving the model in this way includes everything you need to know about the model, including:

  • Model weights
  • Model architecture
  • Model compilation details (loss and metrics)
  • Model optimizer state

This means that you can load and use the model directly without having to re-compile it as you had to in the examples above.

Note: This is the preferred way for saving and loading your Keras model.

How to Save a Keras Model

You can save your model by calling the save() function on the model and specifying the filename.

The example below demonstrates this by first fitting a model, evaluating it, and saving it to the file model.h5.


# MLP for Pima Indians Dataset saved to single file
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load pima indians dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# save model and architecture to single file
model.save("model.h5")
print("Saved model to disk")

 

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

 

Running the example fits the model, summarizes the model’s performance on the training dataset, and saves the model to file.

acc: 77.73%
Saved model to disk

You can later load this model from the file and use it.

Note that in the Keras library, there is another function doing the same, as follows:


...
# equivalent to: model.save("model.h5")
from tensorflow.keras.models import save_model
save_model(model, "model.h5")

How to Load a Keras Model

Your saved model can then be loaded later by calling the load_model() function and passing the filename. The function returns the model with the same architecture and weights.

In this case, you load the model, summarize the architecture, and evaluate it on the same dataset to confirm the weights and architecture are the same.


# load and evaluate a saved model
from numpy import loadtxt
from tensorflow.keras.models import load_model
 
# load model
model = load_model('model.h5')
# summarize model.
model.summary()
# load dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# evaluate the model
score = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))

Running the example first loads the model, prints a summary of the model architecture, and then evaluates the loaded model on the same dataset.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

The model achieves the same accuracy score, which in this case is 77%.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 12)                108       
_________________________________________________________________
dense_2 (Dense)              (None, 8)                 104       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 9         
=================================================================
Total params: 221
Trainable params: 221
Non-trainable params: 0
_________________________________________________________________
 
acc: 77.73%

Protocol Buffer Format

While saving and loading a Keras model using HDF5 format is the recommended way, TensorFlow supports yet another format, the protocol buffer. It is considered faster to save and load a protocol buffer format, but doing so will produce multiple files. The syntax is the same, except that you do not need to provide the .h5 extension to the filename:


# save model and architecture to single file
model.save("model")
 
# ... later
 
# load model
model = load_model('model')
# print summary
model.summary()

These will create a directory “model” with the following files:


model/
|-- assets/
|-- keras_metadata.pb
|-- saved_model.pb
`-- variables/
    |-- variables.data-00000-of-00001
    `-- variables.index

This is also the format used to save a model in TensorFlow v1.x. You may encounter this when you download a pre-trained model from TensorFlow Hub.


Original article sourced at: https://machinelearningmastery.com

#keras 

Save and Load Keras Models

Create Multi-Layer Perceptron Neural Network Models with Keras

In this Keras article, we will learn about how to create a multilayer Perceptron neural network model with Keras. The Keras Python library for deep learning focuses on creating models as a sequence of layers.

In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Neural Network Models in Keras

The focus of the Keras library is a model.

The simplest model is defined in the Sequential class, which is a linear stack of Layers.

You can create a Sequential model and define all the layers in the constructor; for example:

from tensorflow.keras.models import Sequential
model = Sequential(...)

A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:


from tensorflow.keras.models import Sequential
model = Sequential()
model.add(...)
model.add(...)
model.add(...)

Model Inputs

The first layer in your model must specify the shape of the input.

This is the number of input attributes defined by the input_shape argument. This argument expects a tuple.

For example, you can define input in terms of 8 inputs for a Dense type layer as follows:

Dense(16, input_shape=(8,))

Model Layers

Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.

Weight Initialization

The type of initialization used for a layer is specified in the kernel_initializer argument.

Some common types of layer initialization include:

  • random_uniform: Weights are initialized to small uniformly random values between -0.05 and 0.05.
  • random_normal: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).
  • zeros: All weights are set to zero values.

You can see a full list of the initialization techniques supported on the Usage of initializations page.

Activation Function

Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.

You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.

You can see a full list of activation functions supported by Keras on the Usage of activations page.

Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.

Layer Types

There are a large number of core layer types for standard neural networks.

Some common and useful layer types you can choose from are:

  • Dense: Fully connected layer and the most common type of layer used on multi-layer perceptron models
  • Dropout: Apply dropout to the model, setting a fraction of inputs to zero in an effort to reduce overfitting
  • Concatenate: Combine the outputs from multiple layers as input to a single layer

You can learn about the full list of core Keras layers on the Core Layers page.

Model Compilation

Once you have defined your model, it needs to be compiled.

This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.

You compile your model using the compile() function, and it accepts three important attributes:

  1. Model optimizer
  2. Loss function
  3. Metrics

model.compile(optimizer=..., loss=..., metrics=...)

1. Model Optimizers

The optimizer is the search technique used to update weights in your model.

You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:

from tensorflow.keras.optimizers import SGD
sgd = SGD(...)
model.compile(optimizer=sgd)

You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:

model.compile(optimizer='sgd')

Some popular gradient descent optimizers you might want to choose from include:

  • SGD: stochastic gradient descent, with support for momentum
  • RMSprop: adaptive learning rate optimization method proposed by Geoff Hinton
  • Adam: Adaptive Moment Estimation (Adam) that also uses adaptive learning rates

You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.

You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruder’s post, An overview of gradient descent optimization algorithms.

2. Model Loss Functions

The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.

You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:

  • mse‘: for mean squared error
  • binary_crossentropy‘: for binary logarithmic loss (logloss)
  • categorical_crossentropy‘: for multi-class logarithmic loss (logloss)

You can learn more about the loss functions supported by Keras on the Losses page.

3. Model Metrics

Metrics are evaluated by the model during training.

Only one metric is supported at the moment, and that is accuracy.

Model Training

The model is trained on NumPy arrays using the fit() function; for example:

model.fit(X, y, epochs=..., batch_size=...)

Training both specifies the number of epochs to train on and the batch size.

  • Epochs (epochs) refer to the number of times the model is exposed to the training dataset.
  • Batch Size (batch_size) is the number of training instances shown to the model before a weight update is performed.

The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.

Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.

Model Prediction

Once you have trained your model, you can use it to make predictions on test data or new data.

There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:

  • model.evaluate(): To calculate the loss values for the input data
  • model.predict(): To generate network output for the input data

For example, if you provided a batch of data X and the expected output y, you can use evaluate() to calculate the loss metric (the one you defined with compile() before). But for a batch of new data X, you can obtain the network output with predict(). It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use numpy.argmax() to convert the softmax vector into class labels.

Summarize the Model

Once you are happy with your model, you can finalize it.

You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:

model.summary()

You can also retrieve a summary of the model configuration using the get_config() function:

model.get_config()

Finally, you can create an image of your model structure directly:

from tensorflow.keras.utils import plot_model
plot(model, to_file='model.png')

Original article sourced at: https://machinelearningmastery.com

#keras 

Create Multi-Layer Perceptron Neural Network Models with Keras
Marquis  Haag

Marquis Haag

1670670960

Create a Multilayer Perceptron Neural Network Model with Keras

In this Keras article, we will learn about how to create a multilayer Perceptron neural network model with Keras. The Keras Python library for deep learning focuses on creating models as a sequence of layers.

In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Neural Network Models in Keras

The focus of the Keras library is a model.

The simplest model is defined in the Sequential class, which is a linear stack of Layers.

You can create a Sequential model and define all the layers in the constructor; for example:

from tensorflow.keras.models import Sequential
model = Sequential(...)

A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:


from tensorflow.keras.models import Sequential
model = Sequential()
model.add(...)
model.add(...)
model.add(...)

Model Inputs

The first layer in your model must specify the shape of the input.

This is the number of input attributes defined by the input_shape argument. This argument expects a tuple.

For example, you can define input in terms of 8 inputs for a Dense type layer as follows:

Dense(16, input_shape=(8,))

Model Layers

Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.

Weight Initialization

The type of initialization used for a layer is specified in the kernel_initializer argument.

Some common types of layer initialization include:

  • random_uniform: Weights are initialized to small uniformly random values between -0.05 and 0.05.
  • random_normal: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).
  • zeros: All weights are set to zero values.

You can see a full list of the initialization techniques supported on the Usage of initializations page.

Activation Function

Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.

You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.

You can see a full list of activation functions supported by Keras on the Usage of activations page.

Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.

Layer Types

There are a large number of core layer types for standard neural networks.

Some common and useful layer types you can choose from are:

  • Dense: Fully connected layer and the most common type of layer used on multi-layer perceptron models
  • Dropout: Apply dropout to the model, setting a fraction of inputs to zero in an effort to reduce overfitting
  • Concatenate: Combine the outputs from multiple layers as input to a single layer

You can learn about the full list of core Keras layers on the Core Layers page.

Model Compilation

Once you have defined your model, it needs to be compiled.

This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.

You compile your model using the compile() function, and it accepts three important attributes:

  1. Model optimizer
  2. Loss function
  3. Metrics
model.compile(optimizer=..., loss=..., metrics=...)

1. Model Optimizers

The optimizer is the search technique used to update weights in your model.

You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:


from tensorflow.keras.optimizers import SGD
sgd = SGD(...)
model.compile(optimizer=sgd)

You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:

model.compile(optimizer='sgd')

Some popular gradient descent optimizers you might want to choose from include:

  • SGD: stochastic gradient descent, with support for momentum
  • RMSprop: adaptive learning rate optimization method proposed by Geoff Hinton
  • Adam: Adaptive Moment Estimation (Adam) that also uses adaptive learning rates

You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.

You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruder’s post, An overview of gradient descent optimization algorithms.

2. Model Loss Functions

The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.

You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:

  • mse‘: for mean squared error
  • binary_crossentropy‘: for binary logarithmic loss (logloss)
  • categorical_crossentropy‘: for multi-class logarithmic loss (logloss)

You can learn more about the loss functions supported by Keras on the Losses page.

3. Model Metrics

Metrics are evaluated by the model during training.

Only one metric is supported at the moment, and that is accuracy.

Model Training

The model is trained on NumPy arrays using the fit() function; for example:

model.fit(X, y, epochs=..., batch_size=...)

Training both specifies the number of epochs to train on and the batch size.

  • Epochs (epochs) refer to the number of times the model is exposed to the training dataset.
  • Batch Size (batch_size) is the number of training instances shown to the model before a weight update is performed.

The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.

Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.

Model Prediction

Once you have trained your model, you can use it to make predictions on test data or new data.

There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:

  • model.evaluate(): To calculate the loss values for the input data
  • model.predict(): To generate network output for the input data

For example, if you provided a batch of data X and the expected output y, you can use evaluate() to calculate the loss metric (the one you defined with compile() before). But for a batch of new data X, you can obtain the network output with predict(). It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use numpy.argmax() to convert the softmax vector into class labels.

Summarize the Model

Once you are happy with your model, you can finalize it.

You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:


model.summary()

You can also retrieve a summary of the model configuration using the get_config() function:

model.get_config()

Finally, you can create an image of your model structure directly:

from tensorflow.keras.utils import plot_model
plot(model, to_file='model.png')

Resources

You can learn more about how to create a simple neural network and deep learning models in Keras using the following resources:

Summary

In this post, you discovered the Keras API that you can use to create artificial neural networks and deep learning models.


Original article sourced at: https://machinelearningmastery.com

#keras 

Create a Multilayer Perceptron Neural Network Model with Keras

Several Ways to Evaluate Model Performance using Keras

In this Keras article, we will learn about how to Evaluate the Performance of Deep Learning Models in Keras. Keras is an easy-to-use and powerful Python library for deep learning.

There are a lot of decisions to make when designing and configuring your deep learning models. Most of these decisions must be resolved empirically through trial and error and by evaluating them on real data.

As such, it is critically important to have a robust way to evaluate the performance of your neural networks and deep learning models.

In this post, you will discover a few ways to evaluate model performance using Keras.

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code

Empirically Evaluate Network Configurations

You must make a myriad of decisions when designing and configuring your deep learning models.

Many of these decisions can be resolved by copying the structure of other people’s networks and using heuristics. Ultimately, the best technique is to actually design small experiments and empirically evaluate problems using real data.

This includes high-level decisions like the number, size, and type of layers in your network. It also includes the lower-level decisions like the choice of the loss function, activation functions, optimization procedure, and the number of epochs.

Deep learning is often used on problems that have very large datasets. That is tens of thousands or hundreds of thousands of instances.

As such, you need to have a robust test harness that allows you to estimate the performance of a given configuration on unseen data and reliably compare the performance to other configurations.

Data Splitting

The large amount of data and the complexity of the models require very long training times.

As such, it is typical to separate data into training and test datasets or training and validation datasets.

Keras provides two convenient ways of evaluating your deep learning algorithms this way:

  1. Use an automatic verification dataset
  2. Use a manual verification dataset

Use an Automatic Verification Dataset

Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset in each epoch.

You can do this by setting the validation_split argument on the fit() function to a percentage of the size of your training dataset.

For example, a reasonable value might be 0.2 or 0.33 for 20% or 33% of your training data held back for validation.

The example below demonstrates the use of an automatic validation dataset on a small binary classification problem. All examples in this post use the Pima Indians onset of diabetes dataset. You can download it from the UCI Machine Learning Repository and save the data file in your current working directory with the filename pima-indians-diabetes.csv (update: download from here).

# MLP with automatic validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10)

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example, you can see that the verbose output on each epoch shows the loss and accuracy on both the training dataset and the validation dataset.

...
Epoch 145/150
514/514 [==============================] - 0s - loss: 0.5252 - acc: 0.7335 - val_loss: 0.5489 - val_acc: 0.7244
Epoch 146/150
514/514 [==============================] - 0s - loss: 0.5198 - acc: 0.7296 - val_loss: 0.5918 - val_acc: 0.7244
Epoch 147/150
514/514 [==============================] - 0s - loss: 0.5175 - acc: 0.7335 - val_loss: 0.5365 - val_acc: 0.7441
Epoch 148/150
514/514 [==============================] - 0s - loss: 0.5219 - acc: 0.7354 - val_loss: 0.5414 - val_acc: 0.7520
Epoch 149/150
514/514 [==============================] - 0s - loss: 0.5089 - acc: 0.7432 - val_loss: 0.5417 - val_acc: 0.7520
Epoch 150/150
514/514 [==============================] - 0s - loss: 0.5148 - acc: 0.7490 - val_loss: 0.5549 - val_acc: 0.7520

Use a Manual Verification Dataset

Keras also allows you to manually specify the dataset to use for validation during training.

In this example, you can use the handy train_test_split() function from the Python scikit-learn machine learning library to separate your data into a training and test dataset. Use 67% for training and the remaining 33% of the data for validation.

The validation dataset can be specified to the fit() function in Keras by the validation_data argument. It takes a tuple of the input and output datasets.

# MLP with manual validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=150, batch_size=10)

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Like before, running the example provides a verbose output of training that includes the loss and accuracy of the model on both the training and validation datasets for each epoch.

...
Epoch 145/150
514/514 [==============================] - 0s - loss: 0.4847 - acc: 0.7704 - val_loss: 0.5668 - val_acc: 0.7323
Epoch 146/150
514/514 [==============================] - 0s - loss: 0.4853 - acc: 0.7549 - val_loss: 0.5768 - val_acc: 0.7087
Epoch 147/150
514/514 [==============================] - 0s - loss: 0.4864 - acc: 0.7743 - val_loss: 0.5604 - val_acc: 0.7244
Epoch 148/150
514/514 [==============================] - 0s - loss: 0.4831 - acc: 0.7665 - val_loss: 0.5589 - val_acc: 0.7126
Epoch 149/150
514/514 [==============================] - 0s - loss: 0.4961 - acc: 0.7782 - val_loss: 0.5663 - val_acc: 0.7126
Epoch 150/150
514/514 [==============================] - 0s - loss: 0.4967 - acc: 0.7588 - val_loss: 0.5810 - val_acc: 0.6929

Manual k-Fold Cross Validation

The gold standard for machine learning model evaluation is k-fold cross validation.

It provides a robust estimate of the performance of a model on unseen data. It does this by splitting the training dataset into k subsets, taking turns training models on all subsets except one, which is held out, and evaluating model performance on the held-out validation dataset. The process is repeated until all subsets are given an opportunity to be the held-out validation set. The performance measure is then averaged across all models that are created.

It is important to understand that cross validation means estimating a model design (e.g., 3-layer vs. 4-layer neural network) rather than a specific fitted model. You do not want to use a specific dataset to fit the models and compare the result since this may be due to that particular dataset fitting better on one model design. Instead, you want to use multiple datasets to fit, resulting in multiple fitted models of the same design, taking the average performance measure for comparison.

Cross validation is often not used for evaluating deep learning models because of the greater computational expense. For example, k-fold cross validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, significantly adding to the evaluation time of a model.

Nevertheless, when the problem is small enough or if you have sufficient computing resources, k-fold cross validation can give you a less-biased estimate of the performance of your model.

In the example below, you will use the handy StratifiedKFold class from the scikit-learn Python machine learning library to split the training dataset into 10 folds. The folds are stratified, meaning that the algorithm attempts to balance the number of instances of each class in each fold.

The example creates and evaluates 10 models using the 10 splits of the data and collects all the scores. The verbose output for each epoch is turned off by passing verbose=0 to the fit() and evaluate() functions on the model.

The performance is printed for each model, and it is stored. The average and standard deviation of the model performance are then printed at the end of the run to provide a robust estimate of model accuracy.

# MLP for Pima Indians Dataset with 10-fold cross validation
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy as np
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load pima indians dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
 # create model
 model = Sequential()
 model.add(Dense(12, input_dim=8, activation='relu'))
 model.add(Dense(8, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 # Fit the model
 model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
 # evaluate the model
 scores = model.evaluate(X[test], Y[test], verbose=0)
 print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
 cvscores.append(scores[1] * 100)
 
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example will take less than a minute and will produce the following output:

acc: 77.92%
acc: 68.83%
acc: 72.73%
acc: 64.94%
acc: 77.92%
acc: 35.06%
acc: 74.03%
acc: 68.83%
acc: 34.21%
acc: 72.37%
64.68% (+/- 15.50%)

Summary

In this post, you discovered the importance of having a robust way to estimate the performance of your deep learning models on unseen data.

You discovered three ways that you can estimate the performance of your deep learning models in Python using the Keras library:

  • Use Automatic Verification Datasets
  • Use Manual Verification Datasets
  • Use Manual k-Fold Cross Validation

Do you have any questions about deep learning with Keras or this post? Ask your question in the comments, and I will do my best to answer it.


Original article sourced at: https://machinelearningmastery.com

#keras #deep-learning 

Several Ways to Evaluate Model Performance using Keras

Top 3 Ways To Build Machine Learning Models in Keras

In this Keras article, we will learn about Top 3 Ways To Build Machine Learning Models in Keras. If you’ve looked at Keras models on Github, you’ve probably noticed that there are some different ways to create models in Keras. There’s the Sequential model, which allows you to define an entire model in a single line, usually with some line breaks for readability. Then, there’s the functional interface that allows for more complicated model architectures, and there’s also the Model subclass which helps reusability. This article will explore the different ways to create models in Keras, along with their advantages and drawbacks. This will equip you with the knowledge you need to create your own machine learning models in Keras.

After you complete this tutorial, you will learn:

  • Different ways that Keras offers to build models
  • How to use the Sequential class, functional interface, and subclassing keras.Model to build Keras models
  • When to use the different methods to create Keras models

Let’s get started!

Overview

This tutorial is split into three parts, covering the different ways to build machine learning models in Keras:

  • Using the Sequential class
  • Using Keras’s functional interface
  • Subclassing keras.Model

Using the Sequential Class

The Sequential Model is just as the name implies. It consists of a sequence of layers, one after the other. From the Keras documentation,

“A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.”

It is a simple, easy-to-use way to start building your Keras model. To start, import Tensorflow and then the Sequential model:

import tensorflow as tf
from tensorflow.keras import Sequential

Then, you can start building your machine learning model by stacking various layers together. For this example, let’s build a LeNet5 model with the classic CIFAR-10 image dataset as the input:


from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
 
model = Sequential([
          Input(shape=(32,32,3,)),
          Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu"),
          MaxPool2D(pool_size=(2,2)),
          Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu"),
          MaxPool2D(pool_size=(2, 2)),
          Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu"),
          Flatten(),
          Dense(units=84, activation="relu"),
          Dense(units=10, activation="softmax"),
      ])
 
print (model.summary())

Notice that you are just passing in an array of the layers you want your model to contain into the Sequential model constructor. Looking at the model.summary(), you can see the model’s architecture.


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_3 (Conv2D)           (None, 32, 32, 6)         456       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 16, 16, 6)        0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 16, 16, 16)        2416      
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 8, 8, 16)         0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 8, 8, 120)         48120     
                                                                 
 flatten_1 (Flatten)         (None, 7680)              0         
                                                                 
 dense_2 (Dense)             (None, 84)                645204    
                                                                 
 dense_3 (Dense)             (None, 10)                850       
                                                                 
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________

And just to test out the model, let’s go ahead and load the CIFAR-10 dataset and run model.compile and model.fit:


from tensorflow import keras 
 
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
 
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))

This gives us this output:


Epoch 1/10
196/196 [==============================] - 13s 10ms/step - loss: 2.7669 - acc: 0.3648 - val_loss: 1.4869 - val_acc: 0.4713
Epoch 2/10
196/196 [==============================] - 2s 8ms/step - loss: 1.3883 - acc: 0.5097 - val_loss: 1.3654 - val_acc: 0.5205
Epoch 3/10
196/196 [==============================] - 2s 8ms/step - loss: 1.2239 - acc: 0.5694 - val_loss: 1.2908 - val_acc: 0.5472
Epoch 4/10
196/196 [==============================] - 2s 8ms/step - loss: 1.1020 - acc: 0.6120 - val_loss: 1.2640 - val_acc: 0.5622
Epoch 5/10
196/196 [==============================] - 2s 8ms/step - loss: 0.9931 - acc: 0.6498 - val_loss: 1.2850 - val_acc: 0.5555
Epoch 6/10
196/196 [==============================] - 2s 9ms/step - loss: 0.8888 - acc: 0.6903 - val_loss: 1.3150 - val_acc: 0.5646
Epoch 7/10
196/196 [==============================] - 2s 8ms/step - loss: 0.7882 - acc: 0.7229 - val_loss: 1.4273 - val_acc: 0.5426
Epoch 8/10
196/196 [==============================] - 2s 8ms/step - loss: 0.6915 - acc: 0.7582 - val_loss: 1.4574 - val_acc: 0.5604
Epoch 9/10
196/196 [==============================] - 2s 8ms/step - loss: 0.5934 - acc: 0.7931 - val_loss: 1.5304 - val_acc: 0.5631
Epoch 10/10
196/196 [==============================] - 2s 8ms/step - loss: 0.5113 - acc: 0.8214 - val_loss: 1.6355 - val_acc: 0.5512

That’s pretty good for a first pass at a model. Putting the code for LeNet5 using a Sequential model together, you have:


import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D 
 
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
 
model = Sequential([
          Input(shape=(32,32,3,)),
          Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu"),
          MaxPool2D(pool_size=(2,2)),
          Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu"),
          MaxPool2D(pool_size=(2, 2)),
          Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu"),
          Flatten(),
          Dense(units=84, activation="relu"),
          Dense(units=10, activation="softmax"),
      ])
 
print (model.summary())
 
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))

Now, let’s explore what the other ways of constructing Keras models can do, starting with the functional interface!

Using Keras’s Functional Interface

The next method of constructing Keras models you will explore uses Keras’s functional interface. The functional interface uses the layers as functions instead, taking in a Tensor and outputting a Tensor as well. The functional interface is a more flexible way of representing a Keras model as you are not restricted only to sequential models with layers stacked on top of one another. Instead, you can build models that branch into multiple paths, have multiple inputs, etc.

Consider an Add layer that takes inputs from two or more paths and adds the tensors together.

Add layer with two inputs

Since this cannot be represented as a linear stack of layers due to the multiple inputs, you are unable to define it using a Sequential object. Here’s where Keras’s functional interface comes in. You can define an Add layer with two input tensors as such:


from tensorflow.keras.layers import Add
add_layer = Add()([layer1, layer2])

Now that you’ve seen a quick example of the functional interface, let’s take a look at what the LeNet5 model that you defined by instantiating a Sequential class would look like using a functional interface.


import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
from tensorflow.keras.models import Model
 
input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")(input_layer)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")(x)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
 
model = Model(inputs=input_layer, outputs=x)
 
print(model.summary())

And looking at the model summary:


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv2d_6 (Conv2D)           (None, 32, 32, 6)         456       
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 16, 16, 6)        0         
 2D)                                                             
                                                                 
 conv2d_7 (Conv2D)           (None, 16, 16, 16)        2416      
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 8, 8, 16)         0         
 2D)                                                             
                                                                 
 conv2d_8 (Conv2D)           (None, 8, 8, 120)         48120     
                                                                 
 flatten_2 (Flatten)         (None, 7680)              0         
                                                                 
 dense_4 (Dense)             (None, 84)                645204    
                                                                 
 dense_5 (Dense)             (None, 10)                850       
                                                                 
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________

As you can see, the model architecture is the same for both LeNet5 models you implemented using the functional interface or the Sequential class.

Now that you’ve seen how to use Keras’s functional interface, let’s look at a model architecture that you can implement using the functional interface but not with the Sequential class. For this example, look at the residual block introduced in ResNet. Visually, the residual block looks like this:

Residual block, source: https://arxiv.org/pdf/1512.03385.pdf

You can see that a model defined using the Sequential class would be unable to construct such a block due to the skip connection, which prevents this block from being represented as a simple stack of layers. Using the functional interface  is one way you can define a ResNet block:


def residual_block(x, filters):
  # store the input tensor to be added later as the identity
  identity = x
  # change the strides to do like pooling layer (need to see whether we connect before or after this layer though)
  x = Conv2D(filters = filters, kernel_size=(3, 3), strides = (1, 1), padding="same")(x)
  x = BatchNormalization()(x)
  x = relu(x)
  x = Conv2D(filters = filters, kernel_size=(3, 3), padding="same")(x)
  x = BatchNormalization()(x)
  x = Add()([identity, x])
  x = relu(x)
 
  return x

Then, you can build a simple network using these residual blocks using the functional interface:


input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = residual_block(x, 32)
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 64)
x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 128)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
 
model = Model(inputs=input_layer, outputs = x)
print(model.summary())
 
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
 
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))

Running this code and looking at the model summary and training results:


__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 32, 32, 3)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 32, 32, 32)   896         ['input_1[0][0]']                
                                                                                                  
 conv2d_1 (Conv2D)              (None, 32, 32, 32)   9248        ['conv2d[0][0]']                 
                                                                                                  
 batch_normalization (BatchNorm  (None, 32, 32, 32)  128         ['conv2d_1[0][0]']               
 alization)                                                                                       
                                                                                                  
 tf.nn.relu (TFOpLambda)        (None, 32, 32, 32)   0         
['batch_normalization[0][0]']    
                                                                                                  
 conv2d_2 (Conv2D)              (None, 32, 32, 32)   9248        ['tf.nn.relu[0][0]']             
                                                                                                  
 batch_normalization_1 (BatchNo  (None, 32, 32, 32)  128         ['conv2d_2[0][0]']               
 rmalization)                                                                                     
                                                                                                  
 add (Add)                      (None, 32, 32, 32)   0           ['conv2d[0][0]',                 
                                                                  'batch_normalization_1[0][0]']  
                                                                                                  
 tf.nn.relu_1 (TFOpLambda)      (None, 32, 32, 32)   0           ['add[0][0]']                    
                                                                                                  
 conv2d_3 (Conv2D)              (None, 16, 16, 64)   18496       ['tf.nn.relu_1[0][0]']           
                                                                                                  
 conv2d_4 (Conv2D)              (None, 16, 16, 64)   36928       ['conv2d_3[0][0]']               
                                                                                                  
 batch_normalization_2 (BatchNo  (None, 16, 16, 64)  256         ['conv2d_4[0][0]']              
 rmalization)                                                                                     
                                                                                                  
 tf.nn.relu_2 (TFOpLambda)      (None, 16, 16, 64)   0         
['batch_normalization_2[0][0]']  
                                                                                                  
 conv2d_5 (Conv2D)              (None, 16, 16, 64)   36928       ['tf.nn.relu_2[0][0]']           
                                                                                                  
 batch_normalization_3 (BatchNo  (None, 16, 16, 64)  256         ['conv2d_5[0][0]']               
 rmalization)                                                                                     
                                                                                                  
 add_1 (Add)                    (None, 16, 16, 64)   0           ['conv2d_3[0][0]',               
                                                                  'batch_normalization_3[0][0]']  
                                                                                                  
 tf.nn.relu_3 (TFOpLambda)      (None, 16, 16, 64)   0           ['add_1[0][0]']                  
                                                                                                  
 conv2d_6 (Conv2D)              (None, 8, 8, 128)    73856       ['tf.nn.relu_3[0][0]']           
                                                                                                  
 conv2d_7 (Conv2D)              (None, 8, 8, 128)    147584      ['conv2d_6[0][0]']               
                                                                                                  
 batch_normalization_4 (BatchNo  (None, 8, 8, 128)   512         ['conv2d_7[0][0]']               
 rmalization)                                                                                     
                                                                                                  
 tf.nn.relu_4 (TFOpLambda)      (None, 8, 8, 128)    0         
['batch_normalization_4[0][0]']  
                                                                                                  
 conv2d_8 (Conv2D)              (None, 8, 8, 128)    147584      ['tf.nn.relu_4[0][0]']           
                                                                                                  
 batch_normalization_5 (BatchNo  (None, 8, 8, 128)   512         ['conv2d_8[0][0]']               
 rmalization)                                                                                     
                                                                                                  
 add_2 (Add)                    (None, 8, 8, 128)    0           ['conv2d_6[0][0]',               
                                                                  'batch_normalization_5[0][0]']  
                                                                                                  
 tf.nn.relu_5 (TFOpLambda)      (None, 8, 8, 128)    0           ['add_2[0][0]']                  
                                                                                                  
 flatten (Flatten)              (None, 8192)         0           ['tf.nn.relu_5[0][0]']           
                                                                                                  
 dense (Dense)                  (None, 84)           688212      ['flatten[0][0]']                
                                                                                                  
 dense_1 (Dense)                (None, 10)           850         ['dense[0][0]']                  
                                                                                                  
==================================================================================================
Total params: 1,171,622
Trainable params: 1,170,726
Non-trainable params: 896
__________________________________________________________________________________________________
None
Epoch 1/10
196/196 [==============================] - 21s 46ms/step - loss: 3.4463 
acc: 0.3635 - val_loss: 1.8015 - val_acc: 0.3459
Epoch 2/10
196/196 [==============================] - 8s 43ms/step - loss: 1.3267 - acc: 0.5200 - val_loss: 1.3895 - val_acc: 0.5069
Epoch 3/10
196/196 [==============================] - 8s 43ms/step - loss: 1.1095 - acc: 0.6062 - val_loss: 1.2008 - val_acc: 0.5651
Epoch 4/10
196/196 [==============================] - 9s 44ms/step - loss: 0.9618 - acc: 0.6585 - val_loss: 1.5411 - val_acc: 0.5226
Epoch 5/10
196/196 [==============================] - 9s 44ms/step - loss: 0.8656 - acc: 0.6968 - val_loss: 1.1012 - val_acc: 0.6234
Epoch 6/10
196/196 [==============================] - 8s 43ms/step - loss: 0.7622 - acc: 0.7361 - val_loss: 1.1355 - val_acc: 0.6168
Epoch 7/10
196/196 [==============================] - 9s 44ms/step - loss: 0.6801 - acc: 0.7602 - val_loss: 1.1561 - val_acc: 0.6187
Epoch 8/10
196/196 [==============================] - 8s 43ms/step - loss: 0.6106 - acc: 0.7905 - val_loss: 1.1100 - val_acc: 0.6401
Epoch 9/10
196/196 [==============================] - 9s 43ms/step - loss: 0.5367 - acc: 0.8146 - val_loss: 1.2989 - val_acc: 0.6058
Epoch 10/10
196/196 [==============================] - 9s 47ms/step - loss: 0.4776 - acc: 0.8348 - val_loss: 1.0098 - val_acc: 0.6757

And combining the code for our simple network using residual blocks:


import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Conv2D, BatchNormalization, Add, MaxPool2D, Flatten, Dense
from keras.activations import relu
from tensorflow.keras.models import Model
 
def residual_block(x, filters):
  # store the input tensor to be added later as the identity
  identity = x
  # change the strides to do like pooling layer (need to see whether we connect before or after this layer though)
  x = Conv2D(filters = filters, kernel_size=(3, 3), strides = (1, 1), padding="same")(x)
  x = BatchNormalization()(x)
  x = relu(x)
  x = Conv2D(filters = filters, kernel_size=(3, 3), padding="same")(x)
  x = BatchNormalization()(x)
  x = Add()([identity, x])
  x = relu(x)
 
  return x
 
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
 
input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = residual_block(x, 32)
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 64)
x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 128)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
 
model = Model(inputs=input_layer, outputs = x)
print(model.summary())
 
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
 
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))

Subclassing keras.Model

Keras also provides an object-oriented approach to creating models, which helps with reusability and allows you to represent the models you want to create as classes. This representation might be more intuitive since you can think about models as a set of layers strung together to form your network.

To begin subclassing keras.Model, you first need to import it:

from tensorflow.keras.models import Model

Then, you can start subclassing keras.Model. First, you need to build the layers that you want to use in your method calls since you only want to instantiate these layers once instead of each time you call your model. To keep in line with previous examples, let’s build a LeNet5 model here as well.


class LeNet5(tf.keras.Model):
  def __init__(self):
    super(LeNet5, self).__init__()
    #creating layers in initializer
    self.conv1 = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")
    self.max_pool2x2 = MaxPool2D(pool_size=(2,2))
    self.conv2 = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")
    self.conv3 = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")
    self.flatten = Flatten()
    self.fc2 = Dense(units=84, activation="relu")
    self.fc3 = Dense(units=10, activation="softmax")

Then, override the call method to define what happens when the model is called. You override it with your model, which uses the layers you have built in the initializer.`


def call(self, input_tensor):
  # don't create layers here, need to create the layers in initializer,
  # otherwise you will get the tf.Variable can only be created once error
  conv1 = self.conv1(input_tensor)
  maxpool1 = self.max_pool2x2(conv1)
  conv2 = self.conv2(maxpool1)
  maxpool2 = self.max_pool2x2(conv2)
  conv3 = self.conv3(maxpool2)
  flatten = self.flatten(conv3)
  fc2 = self.fc2(flatten)
  fc3 = self.fc3(fc2)
 
  return fc3

It is important to have all the layers created at the class constructor, not inside the call() method. This is because the call() method will be invoked multiple times with different input tensors. But you want to use the same layer objects in each call to optimize their weight. You can then instantiate your new LeNet5 class and use it as part of a model:


input_layer = Input(shape=(32,32,3,))
x = LeNet5()(input_layer)
 
model = Model(inputs=input_layer, outputs=x)
 
print(model.summary(expand_nested=True))

And you can see that the model has the same number of parameters as the previous two versions of LeNet5 that were built previously and has the same structure within it.


_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 le_net5 (LeNet5)            (None, 10)                697046    
|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|
| conv2d (Conv2D)           multiple                  456       |
|                                                               |
| max_pooling2d (MaxPooling2D  multiple               0         |
| )                                                             |
|                                                               |
| conv2d_1 (Conv2D)         multiple                  2416      |
|                                                               |
| conv2d_2 (Conv2D)         multiple                  48120     |
|                                                               |
| flatten (Flatten)         multiple                  0         |
|                                                               |
| dense (Dense)             multiple                  645204    |
|                                                               |
| dense_1 (Dense)           multiple                  850       |
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________

Combining all the code to create your LeNet5 subclass of keras.Model:


import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
from tensorflow.keras.models import Model
 
class LeNet5(tf.keras.Model):
  def __init__(self):
    super(LeNet5, self).__init__()
    #creating layers in initializer
    self.conv1 = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")
    self.max_pool2x2 = MaxPool2D(pool_size=(2,2))
    self.conv2 = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")
    self.conv3 = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")
    self.flatten = Flatten()
    self.fc2 = Dense(units=84, activation="relu")
    self.fc3=Dense(units=10, activation="softmax")
 
  def call(self, input_tensor):
    #don't add layers here, need to create the layers in initializer, otherwise you will get the tf.Variable can only be created once error
    x = self.conv1(input_tensor)
    x = self.max_pool2x2(x)
    x = self.conv2(x)
    x = self.max_pool2x2(x)
    x = self.conv3(x)
    x = self.flatten(x)
    x = self.fc2(x)
    x = self.fc3(x)
    return x  
input_layer = Input(shape=(32,32,3,))
x = LeNet5()(input_layer)
model = Model(inputs=input_layer, outputs=x)
print(model.summary(expand_nested=True))

Summary

In this post, you have seen three different ways to create models in Keras. In particular, this includes using the Sequential class, functional interface, and subclassing keras.Model. You have also seen examples of the same LeNet5 model being built using the different methods and a use case that can be done using the functional interface but not with the Sequential class.


Original article sourced at: https://machinelearningmastery.com

#keras #machine-learning 

Top 3 Ways To Build Machine Learning Models in Keras
Yvonne  Hickle

Yvonne Hickle

1670555280

Implementing Dropout Regularization in Deep Learning Models with Keras

In this Keras article, we will learn about Implementing Dropout Regularization in Deep Learning Models with Keras. Dropout is a simple and powerful regularization technique for neural networks and deep learning models.

In this post, you will discover the Dropout regularization technique and how to apply it to your models in Python with Keras.

After reading this post, you will know:

  • How the Dropout regularization technique works
  • How to use Dropout on your input layers
  • How to use Dropout on your hidden layers
  • How to tune the dropout level on your problem

Dropout Regularization for Neural Networks

Dropout is a regularization technique for neural network models proposed by Srivastava et al. in their 2014 paper “Dropout: A Simple Way to Prevent Neural Networks from Overfitting” (download the PDF).

Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped out” randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass.

As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features, providing some specialization. Neighboring neurons come to rely on this specialization, which, if taken too far, can result in a fragile model too specialized for the training data. This reliance on context for a neuron during training is referred to as complex co-adaptations.

You can imagine that if neurons are randomly dropped out of the network during training, other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network.

The effect is that the network becomes less sensitive to the specific weights of neurons. This, in turn, results in a network capable of better generalization and less likely to overfit the training data.

Dropout Regularization in Keras

Dropout is easily implemented by randomly selecting nodes to be dropped out with a given probability (e.g., 20%) in each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.

Next, let’s explore a few different ways of using Dropout in Keras.

The examples will use the Sonar dataset. This is a binary classification problem that aims to correctly identify rocks and mock-mines from sonar chirp returns. It is a good test dataset for neural networks because all the input values are numerical and have the same scale.

The dataset can be downloaded from the UCI Machine Learning repository. You can place the sonar dataset in your current working directory with the file name sonar.csv.

You will evaluate the developed models using scikit-learn with 10-fold cross validation in order to tease out differences in the results better.

There are 60 input values and a single output value. The input values are standardized before being used in the network. The baseline neural network model has two hidden layers, the first with 60 units and the second with 30. Stochastic gradient descent is used to train the model with a relatively low learning rate and momentum.

The full baseline model is listed below:


# Baseline Model on the Sonar Dataset
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
 
# baseline
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(30,  activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 sgd = SGD(learning_rate=0.01, momentum=0.8)
 model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
 return model
 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example generates an estimated classification accuracy of 86%.

Baseline: 86.04% (4.58%)

Using Dropout on the Visible Layer

Dropout can be applied to input neurons called the visible layer.

In the example below,  a new Dropout layer between the input (or visible layer) and the first hidden layer was added. The dropout rate is set to 20%, meaning one in five inputs will be randomly excluded from each update cycle.

Additionally, as recommended in the original paper on Dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3. This is done by setting the kernel_constraint argument on the Dense class when constructing the layers.

The learning rate was lifted by one order of magnitude, and the momentum was increased to 0.9. These increases in the learning rate were also recommended in the original Dropout paper.

Continuing from the baseline example above, the code below exercises the same network with input dropout:


# Example of Dropout on the Sonar Dataset: Visible Layer
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
 
# dropout in the input layer with weight constraint
def create_model():
 # create model
 model = Sequential()
 model.add(Dropout(0.2, input_shape=(60,)))
 model.add(Dense(60, activation='relu', kernel_constraint=MaxNorm(3)))
 model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 sgd = SGD(learning_rate=0.1, momentum=0.9)
 model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
 return model
 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Visible: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running the example provides a slight drop in classification accuracy, at least on a single test run.

Visible: 83.52% (7.68%)

Using Dropout on Hidden Layers

Dropout can be applied to hidden neurons in the body of your network model.

In the example below, Dropout is applied between the two hidden layers and between the last hidden layer and the output layer. Again a dropout rate of 20% is used as is a weight constraint on those layers.


# Example of Dropout on the Sonar Dataset: Hidden Layer
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
 
# dropout in hidden layers with weight constraint
def create_model():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu', kernel_constraint=MaxNorm(3)))
 model.add(Dropout(0.2))
 model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
 model.add(Dropout(0.2))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 sgd = SGD(learning_rate=0.1, momentum=0.9)
 model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
 return model
 
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Hidden: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You can see that for this problem and the chosen network configuration, using Dropout in the hidden layers did not lift performance. In fact, performance was worse than the baseline.

It is possible that additional training epochs are required or that further tuning is required to the learning rate.

Hidden: 83.59% (7.31%)

Dropout in Evaluation Mode

Dropout will randomly reset some of the input to zero. If you wonder what happens after you have finished training, the answer is nothing! In Keras, a layer can tell if the model is running in training mode or not. The Dropout layer will randomly reset some input only when the model runs for training. Otherwise, the Dropout layer works as a scaler to multiply all input by a factor such that the next layer will see input similar in scale. Precisely, if the dropout rate is r, the input will be scaled by a factor of 1−r.

Tips for Using Dropout

The original paper on Dropout provides experimental results on a suite of standard machine learning problems. As a result, they provide a number of useful heuristics to consider when using Dropout in practice.

  • Generally, use a small dropout value of 20%-50% of neurons, with 20% providing a good starting point. A probability too low has minimal effect, and a value too high results in under-learning by the network.
  • Use a larger network. You are likely to get better performance when Dropout is used on a larger network, giving the model more of an opportunity to learn independent representations.
  • Use Dropout on incoming (visible) as well as hidden units. Application of Dropout at each layer of the network has shown good results.
  • Use a large learning rate with decay and a large momentum. Increase your learning rate by a factor of 10 to 100 and use a high momentum value of 0.9 or 0.99.
  • Constrain the size of network weights. A large learning rate can result in very large network weights. Imposing a constraint on the size of network weights, such as max-norm regularization, with a size of 4 or 5 has been shown to improve results.

Original article sourced at: https://machinelearningmastery.com

#keras #deep-learning 

Implementing Dropout Regularization in Deep Learning Models with Keras
Yvonne  Hickle

Yvonne Hickle

1670550232

Binary Classification with Keras Deep Learning Library

In this Keras article, we will learn Binary Classification with Keras Deep Learning Library. Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.

Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.

Keras allows you to quickly and simply design and train neural networks and deep learning models.

In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.

After completing this tutorial, you will know:

  • How to load training data and make it available to Keras
  • How to design and train a neural network for tabular data
  • How to evaluate the performance of a neural network model in Keras on unseen data
  • How to perform data preparation to improve skill when using neural networks
  • How to tune the topology and configuration of neural networks in Keras

1. Description of the Dataset

The dataset you will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.

It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.

2. Baseline Neural Network Model Performance

Let’s create a baseline model and result for this problem.

You will start by importing all the classes and functions you will need.


import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...

Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.


...
# load dataset
dataframe = pd.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

The output variable is string values. You must convert them into integer values 0 and 1.

You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.


...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

You are now ready to create your neural network model using Keras.

You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.

To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.

Let’s start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.

The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.

Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.


# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.

Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.


...
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Baseline
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

 

Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.

Baseline: 81.68% (7.26%)

This is an excellent score without doing any hard work.

3. Re-Run the Baseline Model with Data Preparation

It is a good practice to prepare your data before modeling.

Neural network models are especially suitable for having consistent input values, both in scale and distribution.

Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.

You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.

Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the “unseen” test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of “unseen” data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.

You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.


...
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You now see a small but very nice lift in the mean accuracy.

Standardized: 84.56% (5.74%)

4. Tuning Layers and Number of Neurons in the Model

There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.

One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.

These are good experiments to perform when tuning a neural network on your problem.

4.1. Evaluate a Smaller Network

Note that there is likely a lot of redundancy in the input variables for this problem.

The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.

In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.

You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.


...
# smaller model
def create_smaller():
 # create model
 model = Sequential()
 model.add(Dense(30, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized Smaller
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# smaller model
def create_smaller():
 # create model
 model = Sequential()
 model.add(Dense(30, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.

Smaller: 86.04% (4.00%)

4.2. Evaluate a Larger Network

A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.

You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.

Your network now has the topology:

60 inputs -> [60 -> 30] -> 1 output

The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.

Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.


...
# larger model
def create_larger():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(30, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized Larger
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(30, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example produces the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.

Larger: 83.14% (4.52%)

With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?

Summary

In this post, you discovered the Keras deep Learning library in Python.

You learned how you can work through a binary classification problem step-by-step with Keras, specifically:

  • How to load and prepare data for use in Keras
  • How to create a baseline neural network model
  • How to evaluate a Keras model using scikit-learn and stratified k-fold cross validation
  • How data preparation schemes can lift the performance of your models
  • How experiments adjusting the network topology can lift model performance

Do you have any questions about deep learning with Keras or this post? Ask your questions in the comments, and I will do my best to answer.


Original article sourced at: https://machinelearningmastery.com

#keras 

Binary Classification with Keras Deep Learning Library
Florida  Feeney

Florida Feeney

1670471536

Binary Classification with Keras Deep Learning Library

In this Keras article, we will learn about Binary Classification with Keras Deep Learning Library. Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano. Keras allows you to quickly and simply design and train neural networks and deep learning models.

Binary Classification Tutorial with the Keras Deep Learning Library

In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.

After completing this tutorial, you will know:

  • How to load training data and make it available to Keras
  • How to design and train a neural network for tabular data
  • How to evaluate the performance of a neural network model in Keras on unseen data
  • How to perform data preparation to improve skill when using neural networks
  • How to tune the topology and configuration of neural networks in Keras

1. Description of the Dataset

The dataset you will use in this tutorial is the Sonar dataset.

This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.

It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string “M” for mine and “R” for rock, which will need to be converted to integers 1 and 0.

A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.

2. Baseline Neural Network Model Performance

Let’s create a baseline model and result for this problem.

You will start by importing all the classes and functions you will need.


import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...

Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.


...
# load dataset
dataframe = pd.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]

The output variable is string values. You must convert them into integer values 0 and 1.

You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.


...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)

You are now ready to create your neural network model using Keras.

You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.

To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.

Let’s start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.

The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.

Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.


# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model

Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.

Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.


...
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Baseline
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

 

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.

Baseline: 81.68% (7.26%)

This is an excellent score without doing any hard work.

3. Re-Run the Baseline Model with Data Preparation

It is a good practice to prepare your data before modeling.

Neural network models are especially suitable for having consistent input values, both in scale and distribution.

Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.

You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.

Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the “unseen” test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of “unseen” data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.

You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.


...
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You now see a small but very nice lift in the mean accuracy.

Standardized: 84.56% (5.74%)

4. Tuning Layers and Number of Neurons in the Model

There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.

One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.

These are good experiments to perform when tuning a neural network on your problem.

4.1. Evaluate a Smaller Network

Note that there is likely a lot of redundancy in the input variables for this problem.

The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.

In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.

You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.


...
# smaller model
def create_smaller():
 # create model
 model = Sequential()
 model.add(Dense(30, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized Smaller
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# smaller model
def create_smaller():
 # create model
 model = Sequential()
 model.add(Dense(30, input_shape=(60,), activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.

Smaller: 86.04% (4.00%)

4.2. Evaluate a Larger Network

A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.

You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.

Your network now has the topology:

60 inputs -> [60 -> 30] -> 1 output

The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.

Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.


...
# larger model
def create_larger():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(30, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

After tying this together, the complete example is listed below.


# Binary Classification with Sonar Dataset: Standardized Larger
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
 # create model
 model = Sequential()
 model.add(Dense(60, input_shape=(60,), activation='relu'))
 model.add(Dense(30, activation='relu'))
 model.add(Dense(1, activation='sigmoid'))
 # Compile model
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Running this example produces the results below.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.

You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.

Larger: 83.14% (4.52%)

With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?


Original article sourced at: https://machinelearningmastery.com

#keras 

Binary Classification with Keras Deep Learning Library
Rowena  Waters

Rowena Waters

1670385252

How to Use Image Enhancement for Deep Learning with Keras

In this Keras tutorial we will learn about How to Use Image Enhancement for Deep Learning with Keras. Data preparation is required when working with neural networks and deep learning models. Increasingly, data augmentation is also required on more complex object recognition tasks.

In this post, you will discover how to use data preparation and data augmentation with your image datasets when developing and evaluating deep learning models in Python with Keras.

After reading this post, you will know:

  • About the image augmentation API provided by Keras and how to use it with your models
  • How to perform feature standardization
  • How to perform ZCA whitening of your images
  • How to augment data with random rotations, shifts, and flips
  • How to save augmented image data to disk

Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.

Let’s get started.

  • Jun/2016: First published
  • Update Aug/2016: The examples in this post were updated for the latest Keras API. The datagen.next() function was removed
  • Update Oct/2016: Updated for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18
  • Update Jan/2017: Updated for Keras 1.2.0 and TensorFlow 0.12.1
  • Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0
  • Update Sep/2019: Updated for Keras 2.2.5 API
  • Update Jul/2022: Updated for TensorFlow 2.x API with a workaround on the feature standardization issue

Keras Image Augmentation API

Like the rest of Keras, the image augmentation API is simple and powerful.

Keras provides the ImageDataGenerator class that defines the configuration for image data preparation and augmentation. This includes capabilities such as:

  • Sample-wise standardization
  • Feature-wise standardization
  • ZCA whitening
  • Random rotation, shifts, shear, and flips
  • Dimension reordering
  • Save augmented images to disk

An augmented image generator can be created as follows:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator()

Rather than performing the operations on your entire image dataset in memory, the API is designed to be iterated by the deep learning model fitting process, creating augmented image data for you just in time. This reduces your memory overhead but adds some additional time cost during model training.

After you have created and configured your ImageDataGenerator, you must fit it on your data. This will calculate any statistics required to actually perform the transforms to your image data. You can do this by calling the fit() function on the data generator and passing it to your training dataset.

datagen.fit(train)

The data generator itself is, in fact, an iterator, returning batches of image samples when requested. You can configure the batch size and prepare the data generator and get batches of images by calling the flow() function.

X_batch, y_batch = datagen.flow(train, train, batch_size=32)

Finally, you can make use of the data generator. Instead of calling the fit() function on your model, you must call the fit_generator() function and pass in the data generator and the desired length of an epoch as well as the total number of epochs on which to train.

fit_generator(datagen, samples_per_epoch=len(train), epochs=100)

You can learn more about the Keras image data generator API in the Keras documentation.

Point of Comparison for Image Augmentation

Now that you know how the image augmentation API in Keras works, let’s look at some examples.

We will use the MNIST handwritten digit recognition task in these examples. To begin with, let’s take a look at the first nine images in the training dataset.


# Plot images
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
# load dbata
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
    for j in range(3):
        ax[i][j].imshow(X_train[i*3+j], cmap=plt.get_cmap("gray"))
# show the plot
plt.show()

Running this example provides the following image that you can use as a point of comparison with the image preparation and augmentation in the examples below.

Example MNIST images

Feature Standardization

It is also possible to standardize pixel values across the entire dataset. This is called feature standardization and mirrors the type of standardization often performed for each column in a tabular dataset.

You can perform feature standardization by setting the featurewise_center and featurewise_std_normalization arguments to True on the ImageDataGenerator class. These are set to False by default. However, the recent version of Keras has a bug in the feature standardization so that the mean and standard deviation is calculated across all pixels. If you use the fit() function from the ImageDataGenerator class, you will see an image similar to the one above:

# Standardize images across the dataset, mean=0, stdev=1
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# fit parameters from data
datagen.fit(X_train)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
    print(X_batch.min(), X_batch.mean(), X_batch.max())
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

For example, the minimum, mean, and maximum values from the batch printed above are:

-0.42407447 -0.04093817 2.8215446

And the image displayed is as follows:

Image from feature-wise standardization

The workaround is to compute the feature standardization manually. Each pixel should have a separate mean and standard deviation, and it should be computed across different samples but independent from other pixels in the same sample. You just need to replace the fit() function with your own computation:


# Standardize images across the dataset, every pixel has mean=0, stdev=1
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# fit parameters from data
datagen.mean = X_train.mean(axis=0)
datagen.std = X_train.std(axis=0)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
    print(X_batch.min(), X_batch.mean(), X_batch.max())
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

The minimum, mean, and maximum as printed now have a wider range:

-1.2742625 -0.028436039 17.46127

Running this example, you can see that the effect is different, seemingly darkening and lightening different digits.

Standardized feature MNIST images

ZCA Whitening

A whitening transform of an image is a linear algebraic operation that reduces the redundancy in the matrix of pixel images.

Less redundancy in the image is intended to better highlight the structures and features in the image to the learning algorithm.

Typically, image whitening is performed using the Principal Component Analysis (PCA) technique. More recently, an alternative called ZCA (learn more in Appendix A of this tech report) shows better results in transformed images that keep all the original dimensions. And unlike PCA, the resulting transformed images still look like their originals. Precisely, whitening converts each image into a white noise vector, i.e., each element in the vector has zero mean and unit standard derivation and is statistically independent of each other.

You can perform a ZCA whitening transform by setting the zca_whitening argument to True. But due to the same issue as feature standardization, you must first zero-center your input data separately:


# ZCA Whitening
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, zca_whitening=True)
# fit parameters from data
X_mean = X_train.mean(axis=0)
datagen.fit(X_train - X_mean)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train - X_mean, y_train, batch_size=9, shuffle=False):
    print(X_batch.min(), X_batch.mean(), X_batch.max())
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

Running the example, you can see the same general structure in the images and how the outline of each digit has been highlighted.

ZCA whitening MNIST images

Random Rotations

Sometimes images in your sample data may have varying and different rotations in the scene.

You can train your model to better handle rotations of images by artificially and randomly rotating images from your dataset during training.

The example below creates random rotations of the MNIST digits up to 90 degrees by setting the rotation_range argument.


# Random Rotations
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(rotation_range=90)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

Running the example, you can see that images have been rotated left and right up to a limit of 90 degrees. This is not helpful on this problem because the MNIST digits have a normalized orientation, but this transform might be of help when learning from photographs where the objects may have different orientations.

Random rotations of MNIST images

Random Shifts

Objects in your images may not be centered in the frame. They may be off-center in a variety of different ways.

You can train your deep learning network to expect and currently handle off-center objects by artificially creating shifted versions of your training data. Keras supports separate horizontal and vertical random shifting of training data by the width_shift_range and height_shift_range arguments.


# Random Shifts
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
shift = 0.2
datagen = ImageDataGenerator(width_shift_range=shift, height_shift_range=shift)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

Running this example creates shifted versions of the digits. Again, this is not required for MNIST as the handwritten digits are already centered, but you can see how this might be useful on more complex problem domains.

Random shifted MNIST images

Random Flips

Another augmentation to your image data that can improve performance on large and complex problems is to create random flips of images in your training data.

Keras supports random flipping along both the vertical and horizontal axes using the vertical_flip and horizontal_flip arguments.


# Random Flips
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

Running this example, you can see flipped digits. Flipping digits is not useful as they will always have the correct left and right orientation, but this may be useful for problems with photographs of objects in a scene that can have a varied orientation.

Randomly flipped MNIST images

Saving Augmented Images to File

The data preparation and augmentation are performed just in time by Keras.

This is efficient in terms of memory, but you may require the exact images used during training. For example, perhaps you would like to use them with a different software package later or only generate them once and use them on multiple different deep learning models or configurations.

Keras allows you to save the images generated during training. The directory, filename prefix, and image file type can be specified to the flow() function before training. Then, during training, the generated images will be written to the file.

The example below demonstrates this and writes nine images to a “images” subdirectory with the prefix “aug” and the file type of PNG.


# Save augmented images to file
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False,
                                     save_to_dir='images', save_prefix='aug', save_format='png'):
    # create a grid of 3x3 images
    fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
    # show the plot
    plt.show()
    break

Running the example, you can see that images are only written when they are generated.

Augmented MNIST Images Saved To File

Augmented MNIST images saved to file

Tips for Augmenting Image Data with Keras

Image data is unique in that you can review the data and transformed copies of the data and quickly get an idea of how the model may perceive it.

Below are some tips for getting the most from image data preparation and augmentation for deep learning.

  • Review Dataset. Take some time to review your dataset in great detail. Look at the images. Take note of image preparation and augmentations that might benefit the training process of your model, such as the need to handle different shifts, rotations, or flips of objects in the scene.
  • Review Augmentations. Review sample images after the augmentation has been performed. It is one thing to intellectually know what image transforms you are using; it is a very different thing to look at examples. Review images both with individual augmentations you are using as well as the full set of augmentations you plan to use. You may see ways to simplify or further enhance your model training process.
  • Evaluate a Suite of Transforms. Try more than one image data preparation and augmentation scheme. Often you can be surprised by the results of a data preparation scheme you did not think would be beneficial.

Original article sourced at: https://machinelearningmastery.com

#keras #deep-learning 

How to Use Image Enhancement for Deep Learning with Keras
Rowena  Waters

Rowena Waters

1670376611

How to Use Keras and Tf.image Preprocessor Layer in TensorFlow

In this TensorFlow tutorial, we will learn about using the Keras and tf.image preprocessing layer in TensorFlow in TensorFlow to enhance images.When you work on a machine learning problem related to images, not only do you need to collect some images as training data, but you also need to employ augmentation to create variations in the image. It is especially true for more complex object recognition problems.

There are many ways for image augmentation. You may use some external libraries or write your own functions for that. There are some modules in TensorFlow and Keras for augmentation too.

In this post, you will discover how you can use the Keras preprocessing layer as well as the tf.image module in TensorFlow for image augmentation.

After reading this post, you will know:

  • What are the Keras preprocessing layers, and how to use them
  • What are the functions provided by the tf.image module for image augmentation
  • How to use augmentation together with the tf.data dataset

Let’s get started.

Overview

This article is divided into five sections; they are:

  • Getting Images
  • Visualizing the Images
  • Keras Preprocessing Layers
  • Using tf.image API for Augmentation
  • Using Preprocessing Layers in Neural Networks

Getting Images

Before you see how you can do augmentation, you need to get the images. Ultimately, you need the images to be represented as arrays, for example, in HxWx3 in 8-bit integers for the RGB pixel value. There are many ways to get the images. Some can be downloaded as a ZIP file. If you’re using TensorFlow, you may get some image datasets from the tensorflow_datasets library.

In this tutorial, you will use the citrus leaves images, which is a small dataset of less than 100MB. It can be downloaded from tensorflow_datasets as follows:

import tensorflow_datasets as tfds
ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True)

Running this code the first time will download the image dataset into your computer with the following output:

Downloading and preparing dataset 63.87 MiB (download: 63.87 MiB, generated: 37.89 MiB, total: 101.76 MiB) to ~/tensorflow_datasets/citrus_leaves/0.1.2...
Extraction completed...: 100%|██████████████████████████████| 1/1 [00:06<00:00,  6.54s/ file]
Dl Size...: 100%|██████████████████████████████████████████| 63/63 [00:06<00:00,  9.63 MiB/s]
Dl Completed...: 100%|███████████████████████████████████████| 1/1 [00:06<00:00,  6.54s/ url]
Dataset citrus_leaves downloaded and prepared to ~/tensorflow_datasets/citrus_leaves/0.1.2. Subsequent calls will reuse this data.

The function above returns the images as a tf.data dataset object and the metadata. This is a classification dataset. You can print the training labels with the following:


...
for i in range(meta.features['label'].num_classes):
    print(meta.features['label'].int2str(i))

This prints:

Black spot
canker
greening
healthy

If you run this code again at a later time, you will reuse the downloaded image. But the other way to load the downloaded images into a tf.data dataset is to use the image_dataset_from_directory() function.

As you can see from the screen output above, the dataset is downloaded into the directory ~/tensorflow_datasets. If you look at the directory, you see the directory structure as follows:

.../Citrus/Leaves
├── Black spot
├── Melanose
├── canker
├── greening
└── healthy

The directories are the labels, and the images are files stored under their corresponding directory. You can let the function to read the directory recursively into a dataset:

import tensorflow as tf
from tensorflow.keras.utils import image_dataset_from_directory
 
# set to fixed image size 256x256
PATH = ".../Citrus/Leaves"
ds = image_dataset_from_directory(PATH,
                                  validation_split=0.2, subset="training",
                                  image_size=(256,256), interpolation="bilinear",
                                  crop_to_aspect_ratio=True,
                                  seed=42, shuffle=True, batch_size=32)

You may want to set batch_size=None if you do not want the dataset to be batched. Usually, you want the dataset to be batched for training a neural network model.

Visualizing the Images

It is important to visualize the augmentation result, so you can verify the augmentation result is what we want it to be. You can use matplotlib for this.

In matplotlib, you have the imshow() function to display an image. However, for the image to be displayed correctly, the image should be presented as an array of 8-bit unsigned integers (uint8).

Given that you have a dataset created using image_dataset_from_directory()You can get the first batch (of 32 images) and display a few of them using imshow(), as follows:

...
import matplotlib.pyplot as plt
 
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
 
for images, labels in ds.take(1):
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
            ax[i][j].set_title(ds.class_names[labels[i*3+j]])
plt.show()

Here, you see a display of nine images in a grid, labeled with their corresponding classification label, using ds.class_names. The images should be converted to NumPy array in uint8 for display. This code displays an image like the following:

The complete code from loading the image to display is as follows:

from tensorflow.keras.utils import image_dataset_from_directory
import matplotlib.pyplot as plt
 
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves'  # modify to your path
ds = image_dataset_from_directory(PATH,
                                  validation_split=0.2, subset="training",
                                  image_size=(256,256), interpolation="mitchellcubic",
                                  crop_to_aspect_ratio=True,
                                  seed=42, shuffle=True, batch_size=32)
 
# Take one batch from dataset and display the images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
 
for images, labels in ds.take(1):
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
            ax[i][j].set_title(ds.class_names[labels[i*3+j]])
plt.show()

Note that if you’re using tensorflow_datasets to get the image, the samples are presented as a dictionary instead of a tuple of (image,label). You should change your code slightly to the following:

import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
 
# use tfds.load() or image_dataset_from_directory() to load images
ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True)
ds = ds.batch(32)
 
# Take one batch from dataset and display the images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
 
for sample in ds.take(1):
    images, labels = sample["image"], sample["label"]
    for i in range(3):
        for j in range(3):
            ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
            ax[i][j].set_title(meta.features['label'].int2str(labels[i*3+j]))
plt.show()

For the rest of this post, assume the dataset is created using image_dataset_from_directory(). You may need to tweak the code slightly if your dataset is created differently.

Keras Preprocessing Layers

Keras comes with many neural network layers, such as convolution layers, that you need to train. There are also layers with no parameters to train, such as flatten layers to convert an array like an image into a vector.

The preprocessing layers in Keras are specifically designed to use in the early stages of a neural network. You can use them for image preprocessing, such as to resize or rotate the image or adjust the brightness and contrast. While the preprocessing layers are supposed to be part of a larger neural network, you can also use them as functions. Below is how you can use the resizing layer as a function to transform some images and display them side-by-side with the original:

...
 
# create a resizing layer
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
 
# show original vs resized
fig, ax = plt.subplots(2, 3, figsize=(6,4))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # resize
        ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("resize")
plt.show()

The images are in 256×256 pixels, and the resizing layer will make them into 256×128 pixels. The output of the above code is as follows:

Since the resizing layer is a function, you can chain them to the dataset itself. For example,

...
def augment(image, label):
    return resize(image), label
 
resized_ds = ds.map(augment)
 
for image, label in resized_ds:
   ...

The dataset ds has samples in the form of (image, label). Hence you created a function that takes in such tuple and preprocesses the image with the resizing layer. You then assigned this function as an argument for the map() in the dataset. When you draw a sample from the new dataset created with the map() function, the image will be a transformed one.

There are more preprocessing layers available. Some are demonstrated below.

As you saw above, you can resize the image. You can also randomly enlarge or shrink the height or width of an image. Similarly, you can zoom in or zoom out on an image. Below is an example of manipulating the image size in various ways for a maximum of 30% increase or decrease:

...
 
# Create preprocessing layers
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
height = tf.keras.layers.RandomHeight(0.3)
width = tf.keras.layers.RandomWidth(0.3)
zoom = tf.keras.layers.RandomZoom(0.3)
 
# Visualize images and augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # resize
        ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("resize")
        # height
        ax[2][i].imshow(height(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("height")
        # width
        ax[3][i].imshow(width(images[i]).numpy().astype("uint8"))
        ax[3][i].set_title("width")
        # zoom
        ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8"))
        ax[4][i].set_title("zoom")
plt.show()

This code shows images as follows:

While you specified a fixed dimension in resize, you have a random amount of manipulation in other augmentations.

You can also do flipping, rotation, cropping, and geometric translation using preprocessing layers:

...
# Create preprocessing layers
flip = tf.keras.layers.RandomFlip("horizontal_and_vertical") # or "horizontal", "vertical"
rotate = tf.keras.layers.RandomRotation(0.2)
crop = tf.keras.layers.RandomCrop(out_height, out_width)
translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2)
 
# Visualize augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # flip
        ax[1][i].imshow(flip(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("flip")
        # crop
        ax[2][i].imshow(crop(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("crop")
        # translation
        ax[3][i].imshow(translation(images[i]).numpy().astype("uint8"))
        ax[3][i].set_title("translation")
        # rotate
        ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8"))
        ax[4][i].set_title("rotate")
plt.show()

This code shows the following images:

And finally, you can do augmentations on color adjustments as well:


...
brightness = tf.keras.layers.RandomBrightness([-0.8,0.8])
contrast = tf.keras.layers.RandomContrast(0.2)
 
# Visualize augmentation
fig, ax = plt.subplots(3, 3, figsize=(6,7))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # brightness
        ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("brightness")
        # contrast
        ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("contrast")
plt.show()

This shows the images as follows:

For completeness, below is the code to display the result of various augmentations:

from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
 
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves'  # modify to your path
ds = image_dataset_from_directory(PATH,
                                  validation_split=0.2, subset="training",
                                  image_size=(256,256), interpolation="mitchellcubic",
                                  crop_to_aspect_ratio=True,
                                  seed=42, shuffle=True, batch_size=32)
 
# Create preprocessing layers
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
height = tf.keras.layers.RandomHeight(0.3)
width = tf.keras.layers.RandomWidth(0.3)
zoom = tf.keras.layers.RandomZoom(0.3)
 
flip = tf.keras.layers.RandomFlip("horizontal_and_vertical")
rotate = tf.keras.layers.RandomRotation(0.2)
crop = tf.keras.layers.RandomCrop(out_height, out_width)
translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2)
 
brightness = tf.keras.layers.RandomBrightness([-0.8,0.8])
contrast = tf.keras.layers.RandomContrast(0.2)
 
# Visualize images and augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # resize
        ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("resize")
        # height
        ax[2][i].imshow(height(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("height")
        # width
        ax[3][i].imshow(width(images[i]).numpy().astype("uint8"))
        ax[3][i].set_title("width")
        # zoom
        ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8"))
        ax[4][i].set_title("zoom")
plt.show()
 
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # flip
        ax[1][i].imshow(flip(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("flip")
        # crop
        ax[2][i].imshow(crop(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("crop")
        # translation
        ax[3][i].imshow(translation(images[i]).numpy().astype("uint8"))
        ax[3][i].set_title("translation")
        # rotate
        ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8"))
        ax[4][i].set_title("rotate")
plt.show()
 
fig, ax = plt.subplots(3, 3, figsize=(6,7))
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # brightness
        ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8"))
        ax[1][i].set_title("brightness")
        # contrast
        ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8"))
        ax[2][i].set_title("contrast")
plt.show()

Finally, it is important to point out that most neural network models can work better if the input images are scaled. While we usually use an 8-bit unsigned integer for the pixel values in an image (e.g., for display using imshow() as above), a neural network prefers the pixel values to be between 0 and 1 or between -1 and +1. This can be done with preprocessing layers too. Below is how you can update one of the examples above to add the scaling layer into the augmentation:

...
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
rescale = tf.keras.layers.Rescaling(1/127.5, offset=-1)  # rescale pixel values to [-1,1]
 
def augment(image, label):
    return rescale(resize(image)), label
 
rescaled_resized_ds = ds.map(augment)
 
for image, label in rescaled_resized_ds:
   ...

Using tf.image API for Augmentation

Besides the preprocessing layer, the tf.image module also provides some functions for augmentation. Unlike the preprocessing layer, these functions are intended to be used in a user-defined function and assigned to a dataset using map() as we saw above.

The functions provided by the tf.image are not duplicates of the preprocessing layers, although there is some overlap. Below is an example of using the tf.image functions to resize and crop images:

...
 
fig, ax = plt.subplots(5, 3, figsize=(6,14))
 
for images, labels in ds.take(1):
    for i in range(3):
        # original
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # resize
        h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
        w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
        ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8"))
        ax[1][i].set_title("resize")
        # crop
        y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8")
        ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8"))
        ax[2][i].set_title("crop")
        # central crop
        x = tf.random.uniform([], minval=0.4, maxval=1.0)
        ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8"))
        ax[3][i].set_title("central crop")
        # crop to (h,w) at random offset
        h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8")
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8"))
        ax[4][i].set_title("random crop")
plt.show()

Below is the output of the above code:

While the display of images matches what you might expect from the code, the use of tf.image functions is quite different from that of the preprocessing layers. Every tf.image function is different. Therefore, you can see the crop_to_bounding_box() function takes pixel coordinates, but the central_crop() function assumes a fraction ratio as the argument.

These functions are also different in the way randomness is handled. Some of these functions do not assume random behavior. Therefore, the random resize should have the exact output size generated using a random number generator separately before calling the resize function. Some other functions, such as stateless_random_crop(), can do augmentation randomly, but a pair of random seeds in the int32 needs to be specified explicitly.

To continue the example, there are the functions for flipping an image and extracting the Sobel edges:

 

...
fig, ax = plt.subplots(5, 3, figsize=(6,14))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # flip
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8"))
        ax[1][i].set_title("flip left-right")
        # flip
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8"))
        ax[2][i].set_title("flip up-down")
        # sobel edge
        sobel = tf.image.sobel_edges(images[i:i+1])
        ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8"))
        ax[3][i].set_title("sobel y")
        # sobel edge
        ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8"))
        ax[4][i].set_title("sobel x")
plt.show()

This shows the following:

And the following are the functions to manipulate the brightness, contrast, and colors:

...
fig, ax = plt.subplots(5, 3, figsize=(6,14))
 
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # brightness
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8"))
        ax[1][i].set_title("brightness")
        # contrast
        ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
        ax[2][i].set_title("contrast")
        # saturation
        ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
        ax[3][i].set_title("saturation")
        # hue
        ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8"))
        ax[4][i].set_title("hue")
plt.show()

This code shows the following:

Below is the complete code to display all of the above:

from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
 
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves'  # modify to your path
ds = image_dataset_from_directory(PATH,
                                  validation_split=0.2, subset="training",
                                  image_size=(256,256), interpolation="mitchellcubic",
                                  crop_to_aspect_ratio=True,
                                  seed=42, shuffle=True, batch_size=32)
 
# Visualize tf.image augmentations
 
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
    for i in range(3):
        # original
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # resize
        h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
        w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
        ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8"))
        ax[1][i].set_title("resize")
        # crop
        y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8")
        ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8"))
        ax[2][i].set_title("crop")
        # central crop
        x = tf.random.uniform([], minval=0.4, maxval=1.0)
        ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8"))
        ax[3][i].set_title("central crop")
        # crop to (h,w) at random offset
        h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8")
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8"))
        ax[4][i].set_title("random crop")
plt.show()
 
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # flip
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8"))
        ax[1][i].set_title("flip left-right")
        # flip
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8"))
        ax[2][i].set_title("flip up-down")
        # sobel edge
        sobel = tf.image.sobel_edges(images[i:i+1])
        ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8"))
        ax[3][i].set_title("sobel y")
        # sobel edge
        ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8"))
        ax[4][i].set_title("sobel x")
plt.show()
 
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
    for i in range(3):
        ax[0][i].imshow(images[i].numpy().astype("uint8"))
        ax[0][i].set_title("original")
        # brightness
        seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
        ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8"))
        ax[1][i].set_title("brightness")
        # contrast
        ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
        ax[2][i].set_title("contrast")
        # saturation
        ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
        ax[3][i].set_title("saturation")
        # hue
        ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8"))
        ax[4][i].set_title("hue")
plt.show()

These augmentation functions should be enough for most uses. But if you have some specific ideas on augmentation, you would probably need a better image processing library. OpenCV and Pillow are common but powerful libraries that allow you to transform images better.

Using Preprocessing Layers in Neural Networks

You used the Keras preprocessing layers as functions in the examples above. But they can also be used as layers in a neural network. It is trivial to use. Below is an example of how you can incorporate a preprocessing layer into a classification network and train it using a dataset:


from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
 
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves'  # modify to your path
ds = image_dataset_from_directory(PATH,
                                  validation_split=0.2, subset="training",
                                  image_size=(256,256), interpolation="mitchellcubic",
                                  crop_to_aspect_ratio=True,
                                  seed=42, shuffle=True, batch_size=32)
 
AUTOTUNE = tf.data.AUTOTUNE
ds = ds.cache().prefetch(buffer_size=AUTOTUNE)
 
num_classes = 5
model = tf.keras.Sequential([
  tf.keras.layers.RandomFlip("horizontal_and_vertical"),
  tf.keras.layers.RandomRotation(0.2),
  tf.keras.layers.Rescaling(1/127.0, offset=-1),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_classes)
])
 
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
  
model.fit(ds, epochs=3)

Running this code gives the following output:

Found 609 files belonging to 5 classes.
Using 488 files for training.
Epoch 1/3
16/16 [==============================] - 5s 253ms/step - loss: 1.4114 - accuracy: 0.4283
Epoch 2/3
16/16 [==============================] - 4s 259ms/step - loss: 0.8101 - accuracy: 0.6475
Epoch 3/3
16/16 [==============================] - 4s 267ms/step - loss: 0.7015 - accuracy: 0.7111

In the code above, you created the dataset with cache() and prefetch(). This is a performance technique to allow the dataset to prepare data asynchronously while the neural network is trained. This would be significant if the dataset has some other augmentation assigned using the map() function.

You will see some improvement in accuracy if you remove the RandomFlip and RandomRotation layers because you make the problem easier. However, as you want the network to predict well on a wide variation of image quality and properties, using augmentation can help your resulting network become more powerful.


Original article sourced at: https://machinelearningmastery.com

#tensorflow #keras 

How to Use Keras and Tf.image Preprocessor Layer in TensorFlow