1674101160
KerasTuner is an easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search. Easily configure your search space with a define-by-run syntax, then leverage one of the available search algorithms to find the best hyperparameter values for your models. KerasTuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms.
KerasTuner requires Python 3.7+ and TensorFlow 2.0+.
Install the latest release:
pip install keras-tuner --upgrade
You can also check out other versions in our GitHub repository.
Import KerasTuner and TensorFlow:
import keras_tuner
from tensorflow import keras
Write a function that creates and returns a Keras model. Use the hp
argument to define the hyperparameters during model creation.
def build_model(hp):
model = keras.Sequential()
model.add(keras.layers.Dense(
hp.Choice('units', [8, 16, 32]),
activation='relu'))
model.add(keras.layers.Dense(1, activation='relu'))
model.compile(loss='mse')
return model
Initialize a tuner (here, RandomSearch
). We use objective
to specify the objective to select the best models, and we use max_trials
to specify the number of different models to try.
tuner = keras_tuner.RandomSearch(
build_model,
objective='val_loss',
max_trials=5)
Start the search and get the best model:
tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val))
best_model = tuner.get_best_models()[0]
To learn more about KerasTuner, check out this starter guide.
Ask your questions on our GitHub Discussions.
If KerasTuner helps your research, we appreciate your citations. Here is the BibTeX entry:
@misc{omalley2019kerastuner,
title = {KerasTuner},
author = {O'Malley, Tom and Bursztein, Elie and Long, James and Chollet, Fran\c{c}ois and Jin, Haifeng and Invernizzi, Luca and others},
year = 2019,
howpublished = {\url{https://github.com/keras-team/keras-tuner}}
}
Official Website: https://keras.io/keras_tuner/
Author: keras-team
Source Code: https://github.com/keras-team/keras-tuner
License: Apache-2.0 license
1674045000
Note: deepjazz is no longer being actively developed. It may be refactored at some point in the future. Goodbye and thank you for your interest π’
I built deepjazz in 36 hours at a hackathon. It uses Keras & Theano, two deep learning libraries, to generate jazz music. Specifically, it builds a two-layer LSTM, learning from the given MIDI file. It uses deep learning, the AI tech that powers Google's AlphaGo and IBM's Watson, to make music -- something that's considered as deeply human.
Check out deepjazz's music on SoundCloud!
Run on CPU with command:
python generator.py [# of epochs]
Run on GPU with command:
THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python generator.py [# of epochs]
Note: running Keras/Theano on GPU is formally supported for only NVIDIA cards (CUDA backend).
Note: preprocess.py
must be modified to work with other MIDI files (the relevant "melody" MIDI part needs to be selected). The ability to handle this natively is a planned feature.
Ji-Sung Kim
Princeton University, Department of Computer Science
hello (at) jisungkim.com
This project develops a lot of preprocessing code (with permission) from Evan Chow's jazzml. Thank you Evan! Public examples from the Keras documentation were also referenced.
Code is licensed under the Apache License 2.0
Images and other media are copyrighted (Ji-Sung Kim)
Author: jisungk
Source Code: https://github.com/jisungk/deepjazz
License: Apache-2.0 license
1672734849
In this Keras article, we will learn about Python Classes and Their Use in Keras. Classes are one of the fundamental building blocks of the Python language, which may be applied in the development of machine learning applications. As we shall see, the Python syntax for developing classes is simple and can be applied to implement callbacks in Keras.
in this tutorial, you will discover the Python classes and their functionality.
After completing this tutorial, you will know:
Kick-start your project with my new book Python for Machine Learning, including step-by-step tutorials and the Python source code files for all examples.
Letβs get started.
This tutorial is divided into six parts; they are:
In object-oriented languages, such as Python, classes are one of the fundamental building blocks.
They can be likened to blueprints for an object, as they define what properties and methods/behaviors an object should have.
β Python Fundamentals, 2018.
Creating a new class creates a new object, where every class instance can be characterized by its attributes to maintain its state and methods to modify its state.
The class keyword allows for the creation of a new class definition, immediately followed by the class name:
class MyClass:
<statements>
In this manner, a new class object bound to the specified class name (MyClass, in this case) is created. Each class object can support instantiation and attribute references, as we will see shortly.
Instantiation is the creation of a new instance of a class.
To create a new instance of a class, we can call it using its class name and assign it to a variable. This will create a new, empty class object:
x = MyClass()
Upon creating a new instance of a class, Python calls its object constructor method, __init()__, which often takes arguments that are used to set the instantiated objectβs attributes.
We can define this constructor method in our class just like a function and specify attributes that will need to be passed in when instantiating an object.
β Python Fundamentals, 2018.
Letβs say, for instance, that we would like to define a new class named Dog:
class Dog:
family = "Canine"
def __init__(self, name, breed):
self.name = name
self.breed = breed
Here, the constructor method takes two arguments, name and breed, which can be passed to it upon instantiating the object:
dog1 = Dog("Lassie", "Rough Collie")
In the example that we are considering, name and breed are known as instance variables (or attributes) because they are bound to a specific instance. This means that such attributes belong only to the object in which they have been set but not to any other object instantiated from the same class.
On the other hand, family is a class variable (or attribute) because it is shared by all instances of the same class.
You may also note that the first argument of the constructor method (or any other method) is often called self. This argument refers to the object that we are in the process of creating. It is good practice to follow the convention of setting the first argument to self to ensure the readability of your code for other programmers.
Once we have set our objectβs attributes, they can be accessed using the dot operator. For example, considering again the dog1 instance of the Dog class, its name attribute may be accessed as follows:
print(dog1.name)
Producing the following output:
Lassie
In addition to having a constructor method, a class object can also have several other methods for modifying its state.
The syntax for defining an instance method is familiar. We pass the argument self β¦ It is always the first argument of an instance method.
β Python Fundamentals, 2018.
Similar to the constructor method, each instance method can take several arguments, with the first one being the argument self that lets us set and access the objectβs attributes:
class Dog:
family = "Canine"
def __init__(self, name, breed):
self.name = name
self.breed = breed
def info(self):
print(self.name, "is a female", self.breed)
Different methods of the same object can also use the self argument to call each other:
class Dog:
family = "Canine"
def __init__(self, name, breed):
self.name = name
self.breed = breed
self.tricks = []
def add_tricks(self, x):
self.tricks.append(x)
def info(self, x):
self.add_tricks(x)
print(self.name, "is a female", self.breed, "that", self.tricks[0])
An output string can then be generated as follows:
dog1 = Dog("Lassie", "Rough Collie")
dog1.info("barks on command")
We find that, in doing so, the barks on command input is appended to the tricks list when the info() method calls the add_tricks() method. The following output is produced:
Lassie is a female Rough Collie that barks on command
Another feature that Python supports is class inheritance.
Inheritance is a mechanism that allows a subclass (also known as a derived or child class) to access all attributes and methods of a superclass (also known as a base or parent class).
The syntax for using a subclass is the following:
class SubClass(BaseClass):
<statements>
It is also possible that a subclass inherits from multiple base classes, too. In this case, the syntax would be as follows:
class SubClass(BaseClass1, BaseClass2, BaseClass3):
<statements>
Class attributes and methods are searched for in the base class and also in subsequent base classes in the case of multiple inheritances.
Python further allows a method in a subclass to override another method in the base class that carries the same name. An overriding method in the subclass may be replacing the base class method or simply extending its capabilities. When an overriding subclass method is available, it is this method that is executed when called, rather than the method with the same name in the base class.
A practical use of classes in Keras is to write oneβs own callbacks.
A callback is a powerful tool in Keras that allows us to look at our modelβs behavior during the different stages of training, testing, and prediction.
Indeed, we may pass a list of callbacks to any of the following:
The Keras API comes with several built-in callbacks. Nonetheless, we might wish to write our own, and for this purpose, we shall look at how to build a custom callback class. In order to do so, we can inherit several methods from the callback base class, which can provide us with information of when:
Letβs first consider a simple example of a custom callback that reports back every time that an epoch starts and ends. We will name this custom callback class, EpochCallback, and override the epoch-level methods, on_epoch_begin() and on_epoch_end(), from the base class, keras.callbacks.Callback:
import tensorflow.keras as keras
class EpochCallback(keras.callbacks.Callback):
def on_epoch_begin(self, epoch, logs=None):
print("Starting epoch {}".format(epoch + 1))
def on_epoch_end(self, epoch, logs=None):
print("Finished epoch {}".format(epoch + 1))
In order to test the custom callback that we have just defined, we need a model to train. For this purpose, letβs define a simple Keras model:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
def simple_model():
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(128, activation="relu"))
model.add(Dense(10, activation="softmax"))
model.compile(loss="categorical_crossentropy",
optimizer="sgd",
metrics=["accuracy"])
return model
We also need a dataset to train on, for which purpose we will be using the MNIST dataset:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Loading the MNIST training and testing data splits
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# Pre-processing the training data
x_train = x_train / 255.0
x_train = x_train.reshape(60000, 28, 28, 1)
y_train_cat = to_categorical(y_train, 10)
Now, letβs try out the custom callback by adding it to the list of callbacks that we pass as input to the keras.Model.fit() method:
model = simple_model()
model.fit(x_train,
y_train_cat,
batch_size=32,
epochs=5,
callbacks=[EpochCallback()],
verbose=0)
The callback that we have just created produces the following output:
Starting epoch 1
Finished epoch 1
Starting epoch 2
Finished epoch 2
Starting epoch 3
Finished epoch 3
Starting epoch 4
Finished epoch 4
Starting epoch 5
Finished epoch 5
We can create another custom callback that monitors the loss value at the end of each epoch and stores the model weights only if the loss has decreased. To this end, we will be reading the loss value from the log dict, which stores the metrics at the end of each batch and epoch. We will also be accessing the model corresponding to the current round of training, testing, or prediction, by means of self.model.
Letβs call this custom callback, CheckpointCallback:
import numpy as np
class CheckpointCallback(keras.callbacks.Callback):
def __init__(self):
super(CheckpointCallback, self).__init__()
self.best_weights = None
def on_train_begin(self, logs=None):
self.best_loss = np.Inf
def on_epoch_end(self, epoch, logs=None):
current_loss = logs.get("loss")
print("Current loss is {}".format(current_loss))
if np.less(current_loss, self.best_loss):
self.best_loss = current_loss
self.best_weights = self.model.get_weights()
print("Storing the model weights at epoch {} \n".format(epoch + 1))
We can try this out again, this time including the CheckpointCallback into the list of callbacks:
model = simple_model()
model.fit(x_train,
y_train_cat,
batch_size=32,
epochs=5,
callbacks=[EpochCallback(), CheckpointCallback()],
verbose=0)
The following output of the two callbacks together is now produced:
Starting epoch 1
Finished epoch 1
Current loss is 0.6327750086784363
Storing the model weights at epoch 1
Starting epoch 2
Finished epoch 2
Current loss is 0.3391888439655304
Storing the model weights at epoch 2
Starting epoch 3
Finished epoch 3
Current loss is 0.29216915369033813
Storing the model weights at epoch 3
Starting epoch 4
Finished epoch 4
Current loss is 0.2625095248222351
Storing the model weights at epoch 4
Starting epoch 5
Finished epoch 5
Current loss is 0.23906977474689484
Storing the model weights at epoch 5
Besides callbacks, we can also make derived classes in Keras for custom metrics (derived from keras.metrics.Metrics
), custom layers (derived from keras.layers.Layer
), custom regularizer (derived from keras.regularizers.Regularizer
), or even custom models (derived from keras.Model
, for such as changing the behavior of invoking a model). All you have to do is follow the guideline to change the member functions of a class. You must use exactly the same name and parameters in the member functions.
Below is an example from Keras documentation:
class BinaryTruePositives(tf.keras.metrics.Metric):
def __init__(self, name='binary_true_positives', **kwargs):
super(BinaryTruePositives, self).__init__(name=name, **kwargs)
self.true_positives = self.add_weight(name='tp', initializer='zeros')
def update_state(self, y_true, y_pred, sample_weight=None):
y_true = tf.cast(y_true, tf.bool)
y_pred = tf.cast(y_pred, tf.bool)
values = tf.logical_and(tf.equal(y_true, True), tf.equal(y_pred, True))
values = tf.cast(values, self.dtype)
if sample_weight is not None:
sample_weight = tf.cast(sample_weight, self.dtype)
values = tf.multiply(values, sample_weight)
self.true_positives.assign_add(tf.reduce_sum(values))
def result(self):
return self.true_positives
def reset_states(self):
self.true_positives.assign(0)
m = BinaryTruePositives()
m.update_state([0, 1, 1, 1], [0, 1, 0, 0])
print('Intermediate result:', float(m.result()))
m.update_state([1, 1, 1, 1], [0, 1, 1, 0])
print('Final result:', float(m.result()))
This reveals why we would need a class for the custom metric: A metric is not just a function but a function that computes its value incrementally, once per batch of training data during the training cycle. Eventually, the result is reported at the result()
function at the end of an epoch and reset its memory using the reset_state()
function so you can start afresh in the next epoch.
For the details on what exactly has to be derived, you should refer to Kerasβ documentation.
Original article sourced at: https://machinelearningmastery.com
1671665820
In this TensorFlow and Keras article, we will Learn about data enhancement techniques, applications, and tools with a tutorial on TensorFlow and Keras.
Data augmentation is a technique of artificially increasing the training set by creating modified copies of a dataset using existing data. It includes making minor changes to the dataset or using deep learning to generate new data points.
Augmented data is driven from original data with some minor changes. In the case of image augmentation, we make geometric and color space transformations (flipping, resizing, cropping, brightness, contrast) to increase the size and diversity of the training set.
Synthetic data is generated artificially without using the original dataset. It often uses DNNs (Deep Neural Networks) and GANs (Generative Adversarial Networks) to generate synthetic data.
Note: the augmentation techniques are not limited to images. You can augment audio, video, text, and other types of data too.
In this section, we will learn about audio, text, image, and advanced data augmentation techniques.
Data augmentation can apply to all machine learning applications where acquiring quality data is challenging. Furthermore, it can help improve model robustness and performance across all fields of study.
Acquiring and labeling medical imaging datasets is time-consuming and expensive. You also need a subject matter expert to validate the dataset before performing data analysis. Using geometric and other transformations can help you train robust and accurate machine-learning models.
For example, in the case of Pneumonia Classification, you can use random cropping, zooming, stretching, and color space transformation to improve the model performance. However, you need to be careful about certain augmentations as they can result in opposite results. For example, random rotation and reflection along the x-axis are not recommended for the X-ray imaging dataset.
Image from ibrahimsobh.github.io | kaggle-COVID19-Classification
There is limited data available on self-driving cars, and companies are using simulated environments to generate synthetic data using reinforcement learning. It can help you train and test machine learning applications where data security is an issue.
Image by David Silver | Autonomous Visualization System from Uber ATG
The possibilities of augmented data as a simulation are endless, as it can be used to generate real-world scenarios.
Text data augmentation is generally used in situations with limited quality data, and improving the performance metric takes priority. You can apply synonym augmentation, word embedding, character swap, and random insertion and deletion. These techniques are also valuable for low-resource languages.
Image from Papers With Code | Selective Text Augmentation with Word Roles for Low-Resource Text Classification.
Researchers use text augmentation for the language models in high error recognition scenarios, sequence-to-sequence data generation, and text classification.
In sound classification and speech recognition, data augmentation works wonders. It improves the model performance even on low-resource languages.
Image by Edward Ma | Noise Injection
The random noise injection, shifting, and changing the pitch can help you produce state-of-the-art speech-to-text models. You can also use GANs to generate realistic sounds for a particular application.
In this tutorial, we are going to learn how to augment image data using Keras and Tensorflow. Furthermore, you will learn how to use your augmented data to train a simple binary classifier. The code mentioned below is the modified version of TensorFlowβs official example.
We recommend following the coding tutorial by practicing on your own. The code source with outputs is available on the DataCamp Workspace.
We will be using TensorFlow and Keras for data augmentation and matplotlib for displaying the images.
%%capture
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
from keras import layers
import keras
The TensorFlow Dataset collection is huge. You can find text, audio, video, graph, time-series, and image datasets. In this tutorial, we will be using the βcats_vs_dogsβ dataset. The dataset size is 786.68 MiB, and we will apply various image augmentation and train the binary classifier.
In the code below, we have loaded 80% training, 10% validation, and a 10% test set with labels and metadata.
%%capture
(train_ds, val_ds, test_ds), metadata = tfds.load(
'cats_vs_dogs',
split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
with_info=True,
as_supervised=True,
)
There are two classes in the dataset βcatβ and βdogβ.
num_classes = metadata.features['label'].num_classes
print(num_classes)
2
We will use iterators to extract only four random images with labels from the training set and display them using the matplotlib `.imshow()` function.
get_label_name = metadata.features['label'].int2str
train_iter = iter(train_ds)
fig = plt.figure(figsize=(7, 8))
for x in range(4):
image, label = next(train_iter)
fig.add_subplot(1, 4, x+1)
plt.imshow(image)
plt.axis('off')
plt.title(get_label_name(label));
As we can see, we got various dog images and a cat image.
We usually use keras.Sequential() to build the model, but we can also use it to add augmentation layers.
In the example, we are resizing and rescaling the image using Keras Sequential and image augmentation layers. We will first resize the image to 180X180 and then rescale it by 1/255. The small image size will help us save time, memory, and computing.
As we can see, we have successfully passed the image through the augmentation layer, and the final output is resized and rescaled.
IMG_SIZE = 180
resize_and_rescale = keras.Sequential([
layers.Resizing(IMG_SIZE, IMG_SIZE),
layers.Rescaling(1./255)
])
result = resize_and_rescale(image)
plt.axis('off')
plt.imshow(result);
1
Letβs apply random flip and rotation to the same image. We will use loop, subplot, and imshow to display six images with random geometric augmentation.
data_augmentation = keras.Sequential([
layers.RandomFlip("horizontal_and_vertical"),
layers.RandomRotation(0.4),
])
plt.figure(figsize=(8, 7))
for i in range(6):
augmented_image = data_augmentation(image)
ax = plt.subplot(2, 3, i + 1)
plt.imshow(augmented_image.numpy()/255)
plt.axis("off")
Note: if you are experiencing βWARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).β, try to convert your image to numpy and divide it by 255. It will show you the clear output instead of a washed-out image.
Apart from simple augmentation, you can also apply RandomContrast, RandomCrop, HeightCrop, WidthCrop, and RandomZoom to the images.
There are two ways to apply augmentation to the images. The first method is by directly adding the augmentation layers to the model.
model = keras.Sequential([
# Add the preprocessing layers you created earlier.
resize_and_rescale,
data_augmentation,
# Add the model layers
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1,activation='sigmoid')
])
Note: the data augmentation is inactive during the testing phase. It will only work for Model.fit, not for Model.evaluate or Model.predict.
The second method is to apply the data augmentation to the entire train set using Dataset.map.
aug_ds = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y))
We will create a data preprocessing function to process train, valid, and test sets.
The function will:
batch_size = 32
AUTOTUNE = tf.data.AUTOTUNE
def prepare(ds, shuffle=False, augment=False):
# Resize and rescale all datasets.
ds = ds.map(lambda x, y: (resize_and_rescale(x), y),
num_parallel_calls=AUTOTUNE)
if shuffle:
ds = ds.shuffle(1000)
# Batch all datasets.
ds = ds.batch(batch_size)
# Use data augmentation only on the training set.
if augment:
ds = ds.map(lambda x, y: (data_augmentation(x, training=True), y),
num_parallel_calls=AUTOTUNE)
# Use buffered prefetching on all datasets.
return ds.prefetch(buffer_size=AUTOTUNE)
train_ds = prepare(train_ds, shuffle=True, augment=True)
val_ds = prepare(val_ds)
test_ds = prepare(test_ds)
We will create a simple model with convolution and dense layers. Make sure the input shape is similar to the image shape.
model = keras.Sequential([
layers.Conv2D(32, (3, 3), input_shape=(180,180,3), padding='same', activation='relu'),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dense(32, activation='relu'),
layers.Dense(1,activation='softmax')
])
We will now compile the model and train it for one epoch. The optimizer is Adam, the loss function is Binary Cross Entropy, and the metric is accuracy.
As we can observe, we got 51% validation accuracy on the single run. You can train it for multiple epochs and optimize hyper-parameters to get even better results.
The model building and training part is just to give you an idea of how you can augment the images and train the model.
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
epochs=1
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
582/582 [==============================] - 98s 147ms/step - loss: 0.6993 - accuracy: 0.4961 - val_loss: 0.6934 - val_accuracy: 0.5185
loss, acc = model.evaluate(test_ds)
73/73 [==============================] - 4s 48ms/step - loss: 0.6932 - accuracy: 0.5013
Learn to conduct image analysis, and construct, train, and evaluate convolution networks by taking the Image Processing with Keras course.
In this section, we will learn to augment images using TensorFlow to have finer control of data augmentation.
We will load the cats_vs_dogs dataset again with labels and metadata.
%%capture
(train_ds, val_ds, test_ds), metadata = tfds.load(
'cats_vs_dogs',
split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
with_info=True,
as_supervised=True,
)
Instead of a cat image, we will be using the dog image and applying various augmentation techniques.
image, label = next(iter(train_ds))
plt.imshow(image)
plt.title(get_label_name(label));
1
We will create the visualize function to display the difference between the original and augmented image.
The function is pretty straightforward. It takes the original image and the augmentation function as input and displays the difference using matplotlib.
def visualize(original, augmented):
fig = plt.figure()
plt.subplot(1,2,1)
plt.title('Original image')
plt.imshow(original)
plt.axis("off")
plt.subplot(1,2,2)
plt.title('Augmented image')
plt.imshow(augmented)
plt.axis("off")
As we can see, we have flipped the image from left to right using the tf.image function. It is much simpler than keras.Sequential.
flipped = tf.image.flip_left_right(image)
visualize(image, flipped)
1
Letβs convert the image to grayscale using `tf.image.rgb_to_grayscale`.
grayscaled = tf.image.rgb_to_grayscale(image)
visualize(image, tf.squeeze(grayscaled))
1
You can also adjust saturation by a factor of 3.
saturated = tf.image.adjust_saturation(image, 3)
visualize(image, saturated)
1
Adjust the brightness by providing a brightness factor.
bright = tf.image.adjust_brightness(image, 0.4)
visualize(image, bright)
1
Crop the image from the center using a central fraction of 0.5.
cropped = tf.image.central_crop(image, central_fraction=0.5)
visualize(image, cropped)
1
Rotate the image to 90 degrees using the `tf.image.rot90` function.
rotated = tf.image.rot90(image)
visualize(image, rotated)
1
Just like Keras layers, tf.image also has random augmentation functions. In the example below, we will apply the random brightness to the image and display multiple results.
As we can see, the first image is a bit darker, and the next two images are brighter.
for i in range(3):
seed = (i, 0) # tuple of size (2,)
stateless_random_brightness = tf.image.stateless_random_brightness(
image, max_delta=0.95, seed=seed)
visualize(image, stateless_random_brightness)
Just like keras, we can apply a data augmentation function to the entire dataset using Dataset.map.
def augment(image, label):
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, [IMG_SIZE, IMG_SIZE])
image = (image / 255.0)
image = tf.image.random_crop(image, size=[IMG_SIZE, IMG_SIZE, 3])
image = tf.image.random_brightness(image, max_delta=0.5)
return image, label
train_ds = (
train_ds
.shuffle(1000)
.map(augment, num_parallel_calls=AUTOTUNE)
.batch(batch_size)
.prefetch(AUTOTUNE)
)
The Keras ImageDataGenerator is even simpler. It works best when you are loading data from a local directory or CSV.
In the example, we will download and load a small CIFAR10 dataset from Keras default dataset library.
After that, we will apply augmentation using `keras.preprocessing.image.ImageDataGenerator`. The function will randomly rotate, change the height and width, and horizontally flip the images.
Finally, we will fit ImageDataGenerator to the training dataset and display six images with random augmentation.
Note: the image size is 32x32, so we have a low-resolution display.
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
datagen = keras.preprocessing.image.ImageDataGenerator(rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
validation_split=0.2)
datagen.fit(x_train)
for X_batch, y_batch in datagen.flow(x_train,y_train, batch_size=6):
for i in range(0, 6):
plt.subplot(2,3,i+1)
plt.imshow(X_batch[i]/255)
plt.axis('off')
break
In this section, we will learn about other open-source tools that you can use to perform various data augmentation techniques and improve the model performance.
Image transformation is available in the torchvision.transforms module. Similar to Keras, you can add transform layers within torch.nn.Sequential or apply an augmentation function separately on the dataset.
Augmentor is a Python package for image augmentation and artificial image generation. You can perform Perspective Skewing, Elastic Distortions, Rotating, Shearing, Cropping, and Mirroring. Augmentor also comes with basic image pre-processing functionality.
Albumentations is a fast and flexible Python tool for image augmentation. It is widely used in machine learning competitions, industry, and research to improve the performance of deep convolutional neural networks.
Imgaug is an open-source tool for image augmentation. It supports a wide variety of augmentation techniques, such as Gaussian noise, contrast, sharpness, crop, affine, and flip. It has a simple yet powerful stochastic interface, and it comes with keypoints, bounding boxes, heatmaps, and segmentation maps.
OpenCV is a massive open-source library for computer vision, machine learning, and image processing. It is generally used in building real-time applications. You can use OpenCV to augment images and videos hassle-free.
Image augmentation functions provided by Tensorflow and Keras are convenient. You just have to add an augmentation layer, tf.image, or ImageDataGenerator to perform augmentation. Apart from deep learning frameworks, you can use standalone tools such as Augmentor, Albumentations, OpenCV, and Imgaug to perform data augmentation.`
Original article sourced at: https://www.datacamp.com
1670827598
In this Python post we will learn How to build your first Deep Learning Project in Python with Keras. Keras is a powerful and easy-to-use free open source Python library for developing and evaluating deep learning models.
It is part of the TensorFlow library and allows you to define and train neural network models in just a few lines of code.
In this tutorial, you will discover how to create your first deep learning neural network model in Python using Keras.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
There is not a lot of code required, but we will go over it slowly so that you will know how to create your own models in the future.
The steps you will learn in this tutorial are as follows:
This Keras tutorial makes a few assumptions. You will need to have:
If you need help with your environment, see the tutorial:
Create a new file called keras_first_network.py and type or copy-and-paste the code into the file as you go.
The first step is to define the functions and classes you intend to use in this tutorial.
You will use the NumPy library to load your dataset and two classes from the Keras library to define your model.
The imports required are listed below.
# first neural network with keras tutorial
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
...
You can now load our dataset.
In this Keras tutorial, you will use the Pima Indians onset of diabetes dataset. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.
As such, it is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values and is an ideal choice for our first neural network in Keras.
The dataset is available here:
Download the dataset and place it in your local working directory, the same location as your Python file.
Save it with the filename:
pima-indians-diabetes.csv
Take a look inside the file; you should see rows of data like the following:
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
...
You can now load the file as a matrix of numbers using the NumPy function loadtxt().
There are eight input variables and one output variable (the last column). You will be learning a model to map rows of input variables (X) to an output variable (y), which is often summarized as y = f(X).
The variables can be summarized as follows:
Input Variables (X):
Output Variables (y):
Once the CSV file is loaded into memory, you can split the columns of data into input and output variables.
The data will be stored in a 2D array where the first dimension is rows and the second dimension is columns, e.g., [rows, columns].
You can split the array into two arrays by selecting subsets of columns using the standard NumPy slice operator or β:β. You can select the first eight columns from index 0 to index 7 via the slice 0:8. We can then select the output column (the 9th variable) via index 8.
...
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
...
You are now ready to define your neural network model.
Note: The dataset has nine columns, and the range 0:8 will select columns from 0 to 7, stopping before index 8. If this is new to you, then you can learn more about array slicing and ranges in this post:
Models in Keras are defined as a sequence of layers.
We create a Sequential model and add layers one at a time until we are happy with our network architecture.
The first thing to get right is to ensure the input layer has the correct number of input features. This can be specified when creating the first layer with the input_shape argument and setting it to (8,)
for presenting the eight input variables as a vector.
How do we know the number of layers and their types?
This is a tricky question. There are heuristics that you can use, and often the best network structure is found through a process of trial and error experimentation (I explain more about this here). Generally, you need a network large enough to capture the structure of the problem.
In this example, letβs use a fully-connected network structure with three layers.
Fully connected layers are defined using the Dense class. You can specify the number of neurons or nodes in the layer as the first argument and the activation function using the activation argument.
Also, you will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.
It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function. Using a sigmoid on the output layer ensures your network output is between 0 and 1 and is easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.
You can piece it all together by adding each layer:
...
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
...
Note: The most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer. This means that the line of code that adds the first Dense layer is doing two things, defining the input or visible layer and the first hidden layer.
Now that the model is defined, you can compile it.
Compiling the model uses the efficient numerical libraries under the covers (the so-called backend) such as Theano or TensorFlow. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU, GPU, or even distributed.
When compiling, you must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in your dataset.
You must specify the loss function to use to evaluate a set of weights, the optimizer used to search through different weights for the network, and any optional metrics you want to collect and report during training.
In this case, use cross entropy as the loss argument. This loss is for a binary classification problems and is defined in Keras as βbinary_crossentropyβ. You can learn more about choosing loss functions based on your problem here:
We will define the optimizer as the efficient stochastic gradient descent algorithm βadamβ. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. To learn more about the Adam version of stochastic gradient descent, see the post:
Finally, because it is a classification problem, you will collect and report the classification accuracy defined via the metrics argument.
...
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
...
You have defined your model and compiled it to get ready for efficient computation.
Now it is time to execute the model on some data.
You can train or fit your model on your loaded data by calling the fit() function on the model.
Training occurs over epochs, and each epoch is split into batches.
One epoch comprises one or more batches, based on the chosen batch size, and the model is fit for many epochs. For more on the difference between epochs and batches, see the post:
The training process will run for a fixed number of epochs (iterations) through the dataset that you must specify using the epochs argument. You must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size, and set using the batch_size argument.
This problem will run for a small number of epochs (150) and use a relatively small batch size of 10.
These configurations can be chosen experimentally by trial and error. You want to train the model enough so that it learns a good (or good enough) mapping of rows of input data to the output classification. The model will always have some error, but the amount of error will level out after some point for a given model configuration. This is called model convergence.
...
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
...
This is where the work happens on your CPU or GPU.
No GPU is required for this example, but if youβre interested in how to run large models on GPU hardware cheaply in the cloud, see this post:
You have trained our neural network on the entire dataset, and you can evaluate the performance of the network on the same dataset.
This will only give you an idea of how well you have modeled the dataset (e.g., train accuracy), but no idea of how well the algorithm might perform on new data. This was done for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.
You can evaluate your model on your training dataset using the evaluate() function and pass it the same input and output used to train the model.
This will generate a prediction for each input and output pair and collect scores, including the average loss and any metrics you have configured, such as accuracy.
The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset, and the second will be the accuracy of the model on the dataset. You are only interested in reporting the accuracy so ignore the loss value.
...
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
You have just seen how you can easily create your first neural network model in Keras.
Letβs tie it all together into a complete code example.
# first neural network with keras tutorial
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
You can copy all the code into your Python file and save it as βkeras_first_network.pyβ in the same directory as your data file βpima-indians-diabetes.csvβ. You can then run the Python file as a script from your command line (command prompt) as follows:
python keras_first_network.py
Running this example, you should see a message for each of the 150 epochs, printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.
It takes about 10 seconds to execute on my workstation running on the CPU.
Ideally, you would like the loss to go to zero and the accuracy to go to 1.0 (e.g., 100%). This is not possible for any but the most trivial machine learning problems. Instead, you will always have some error in your model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest accuracy possible for a given dataset.
...
768/768 [==============================] - 0s 63us/step - loss: 0.4817 - acc: 0.7708
Epoch 147/150
768/768 [==============================] - 0s 63us/step - loss: 0.4764 - acc: 0.7747
Epoch 148/150
768/768 [==============================] - 0s 63us/step - loss: 0.4737 - acc: 0.7682
Epoch 149/150
768/768 [==============================] - 0s 64us/step - loss: 0.4730 - acc: 0.7747
Epoch 150/150
768/768 [==============================] - 0s 63us/step - loss: 0.4754 - acc: 0.7799
768/768 [==============================] - 0s 38us/step
Accuracy: 76.56
Note: If you try running this example in an IPython or Jupyter notebook, you may get an error.
The reason is the output progress bars during training. You can easily turn these off by setting verbose=0 in the call to the fit() and evaluate() functions; for example:
...
# fit the keras model on the dataset without progress bars
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# evaluate the keras model
_, accuracy = model.evaluate(X, y, verbose=0)
...
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
What score did you get?
Post your results in the comments below.
Neural networks are stochastic algorithms, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. This is a feature, not a bug. You can learn more about this in the post:
The variance in the performance of the model means that to get a reasonable approximation of how well your model is performing, you may need to fit it many times and calculate the average of the accuracy scores. For more on this approach to evaluating neural networks, see the post:
For example, below are the accuracy scores from re-running the example five times:
Accuracy: 75.00
Accuracy: 77.73
Accuracy: 77.60
Accuracy: 78.12
Accuracy: 76.17
You can see that all accuracy scores are around 77%, and the average is 76.924%.
The number one question I get asked is:
βAfter I train my model, how can I use it to make predictions on new data?β
Great question.
You can adapt the above example and use it to generate predictions on the training dataset, pretending it is a new dataset you have not seen before.
Making predictions is as easy as calling the predict() function on the model. You are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. You can easily convert them into a crisp binary prediction for this classification task by rounding them.
For example:
...
# make probability predictions with the model
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
Alternately, you can convert the probability into 0 or 1 to predict crisp classes directly; for example:
...
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
The complete example below makes predictions for each example in the dataset, then prints the input data, predicted class, and expected class for the first five examples in the dataset.
# first neural network with keras make predictions
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
Running the example does not show the progress bar as before, as the verbose argument has been set to 0.
After the model is fit, predictions are made for all examples in the dataset, and the input rows and predicted class value for the first five examples is printed and compared to the expected class value.
You can see that most rows are correctly predicted. In fact, you can expect about 76.9% of the rows to be correctly predicted based on your estimated performance of the model in the previous section.
[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)
If you would like to know more about how to make predictions with Keras models, see the post:
In this post, you discovered how to create your first neural network model using the powerful Keras Python library for deep learning.
Specifically, you learned the six key steps in using Keras to create a neural network or deep learning model step-by-step, including:
Do you have any questions about Keras or about this tutorial?
Ask your question in the comments, and I will do my best to answer.
Well done, you have successfully developed your first neural network using the Keras deep learning library in Python.
This section provides some extensions to this tutorial that you might want to explore.
Original article sourced at: https://machinelearningmastery.com
1670817060
In this Keras article, we will learn about How to Save and Load Your Keras Deep Learning Model. Keras is a simple and powerful Python library for deep learning.
Since deep learning models can take hours, days, and even weeks to train, it is important to know how to save and load them from a disk.
In this post, you will discover how to save your Keras models to files and load them up again to make predictions.
After reading this tutorial, you will know:
If you are new to Keras or deep learning, see this step-by-step Keras tutorial.
Keras separates the concerns of saving your model architecture and saving your model weights.
Model weights are saved to an HDF5 format. This grid format is ideal for storing multi-dimensional arrays of numbers.
The model structure can be described and saved using two different formats: JSON and YAML.
In this post, you will look at three examples of saving and loading your model to a file:
The first two examples save the model architecture and weights separately. The model weights are saved into an HDF5 format file in all cases.
The examples will use the same simple network trained on the Pima Indians onset of diabetes binary classification dataset. This is a small dataset that contains all numerical data and is easy to work with. You can download this dataset and place it in your working directory with the filename βpima-indians-diabetes.csvβ (update: download from here).
Confirm that you have TensorFlow v2.x installed (e.g., v2.9 as of June 2022).
Note: Saving models requires that you have the h5py library installed. It is usually installed as a dependency with TensorFlow. You can also install it easily as follows:
sudo pip install h5py
JSON is a simple file format for describing data hierarchically.
Keras provides the ability to describe any model using JSON format with a to_json() function. This can be saved to a file and later loaded via the model_from_json() function that will create a new model from the JSON specification.
The weights are saved directly from the model using the save_weights() function and later loaded using the symmetrical load_weights() function.
The example below trains and evaluates a simple model on the Pima Indians dataset. The model is then converted to JSON format and written to model.json in the local directory. The network weights are written to model.h5 in the local directory.
The model and weight data is loaded from the saved files, and a new model is created. It is important to compile the loaded model before it is used. This is so that predictions made using the model can use the appropriate efficient computation from the Keras backend.
The model is evaluated in the same way, printing the same evaluation score.
# MLP for Pima Indians Dataset Serialize to JSON and HDF5
from tensorflow.keras.models import Sequential, model_from_json
from tensorflow.keras.layers import Dense
import numpy
import os
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
# later...
# load json and create model
json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running this example provides the output below.
acc: 78.78%
Saved model to disk
Loaded model from disk
acc: 78.78%
The JSON format of the model looks like the following:
{
"class_name":"Sequential",
"config":{
"name":"sequential_1",
"layers":[
{
"class_name":"Dense",
"config":{
"name":"dense_1",
"trainable":true,
"batch_input_shape":[
null,
8
],
"dtype":"float32",
"units":12,
"activation":"relu",
"use_bias":true,
"kernel_initializer":{
"class_name":"VarianceScaling",
"config":{
"scale":1.0,
"mode":"fan_avg",
"distribution":"uniform",
"seed":null
}
},
"bias_initializer":{
"class_name":"Zeros",
"config":{
}
},
"kernel_regularizer":null,
"bias_regularizer":null,
"activity_regularizer":null,
"kernel_constraint":null,
"bias_constraint":null
}
},
{
"class_name":"Dense",
"config":{
"name":"dense_2",
"trainable":true,
"dtype":"float32",
"units":8,
"activation":"relu",
"use_bias":true,
"kernel_initializer":{
"class_name":"VarianceScaling",
"config":{
"scale":1.0,
"mode":"fan_avg",
"distribution":"uniform",
"seed":null
}
},
"bias_initializer":{
"class_name":"Zeros",
"config":{
}
},
"kernel_regularizer":null,
"bias_regularizer":null,
"activity_regularizer":null,
"kernel_constraint":null,
"bias_constraint":null
}
},
{
"class_name":"Dense",
"config":{
"name":"dense_3",
"trainable":true,
"dtype":"float32",
"units":1,
"activation":"sigmoid",
"use_bias":true,
"kernel_initializer":{
"class_name":"VarianceScaling",
"config":{
"scale":1.0,
"mode":"fan_avg",
"distribution":"uniform",
"seed":null
}
},
"bias_initializer":{
"class_name":"Zeros",
"config":{
}
},
"kernel_regularizer":null,
"bias_regularizer":null,
"activity_regularizer":null,
"kernel_constraint":null,
"bias_constraint":null
}
}
]
},
"keras_version":"2.2.5",
"backend":"tensorflow"
}
Note: This method only applies to TensorFlow 2.5 or earlier. If you run it in later versions of TensorFlow, you will see a RuntimeError with the message βMethod model.to_yaml()
has been removed due to security risk of arbitrary code execution. Please use model.to_json()
instead.β
This example is much the same as the above JSON example, except the YAML format is used for the model specification.
Note, this example assumes that you have PyYAML 5 installed:
sudo pip install PyYAML
In this example, the model is described using YAML, saved to file model.yaml, and later loaded into a new model via the model_from_yaml() function.
Weights are handled the same way as above in the HDF5 format as model.h5.
# MLP for Pima Indians Dataset serialize to YAML and HDF5
from tensorflow.keras.models import Sequential, model_from_yaml
from tensorflow.keras.layers import Dense
import numpy
import os
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# serialize model to YAML
model_yaml = model.to_yaml()
with open("model.yaml", "w") as yaml_file:
yaml_file.write(model_yaml)
# serialize weights to HDF5
model.save_weights("model.h5")
print("Saved model to disk")
# later...
# load YAML and create model
yaml_file = open('model.yaml', 'r')
loaded_model_yaml = yaml_file.read()
yaml_file.close()
loaded_model = model_from_yaml(loaded_model_yaml)
# load weights into new model
loaded_model.load_weights("model.h5")
print("Loaded model from disk")
# evaluate loaded model on test data
loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
score = loaded_model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (loaded_model.metrics_names[1], score[1]*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example displays the following output.
acc: 78.78%
Saved model to disk
Loaded model from disk
acc: 78.78%
The model described in YAML format looks like the following:
backend: tensorflow
class_name: Sequential
config:
layers:
- class_name: Dense
config:
activation: relu
activity_regularizer: null
batch_input_shape: !!python/tuple
- null
- 8
bias_constraint: null
bias_initializer:
class_name: Zeros
config: {}
bias_regularizer: null
dtype: float32
kernel_constraint: null
kernel_initializer:
class_name: VarianceScaling
config:
distribution: uniform
mode: fan_avg
scale: 1.0
seed: null
kernel_regularizer: null
name: dense_1
trainable: true
units: 12
use_bias: true
- class_name: Dense
config:
activation: relu
activity_regularizer: null
bias_constraint: null
bias_initializer:
class_name: Zeros
config: {}
bias_regularizer: null
dtype: float32
kernel_constraint: null
kernel_initializer:
class_name: VarianceScaling
config:
distribution: uniform
mode: fan_avg
scale: 1.0
seed: null
kernel_regularizer: null
name: dense_2
trainable: true
units: 8
use_bias: true
- class_name: Dense
config:
activation: sigmoid
activity_regularizer: null
bias_constraint: null
bias_initializer:
class_name: Zeros
config: {}
bias_regularizer: null
dtype: float32
kernel_constraint: null
kernel_initializer:
class_name: VarianceScaling
config:
distribution: uniform
mode: fan_avg
scale: 1.0
seed: null
kernel_regularizer: null
name: dense_3
trainable: true
units: 1
use_bias: true
name: sequential_1
keras_version: 2.2.5
Keras also supports a simpler interface to save both the model weights and model architecture together into a single H5 file.
Saving the model in this way includes everything you need to know about the model, including:
This means that you can load and use the model directly without having to re-compile it as you had to in the examples above.
Note: This is the preferred way for saving and loading your Keras model.
You can save your model by calling the save() function on the model and specifying the filename.
The example below demonstrates this by first fitting a model, evaluating it, and saving it to the file model.h5.
# MLP for Pima Indians Dataset saved to single file
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load pima indians dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# save model and architecture to single file
model.save("model.h5")
print("Saved model to disk")
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example fits the model, summarizes the modelβs performance on the training dataset, and saves the model to file.
acc: 77.73%
Saved model to disk
You can later load this model from the file and use it.
Note that in the Keras library, there is another function doing the same, as follows:
...
# equivalent to: model.save("model.h5")
from tensorflow.keras.models import save_model
save_model(model, "model.h5")
Your saved model can then be loaded later by calling the load_model()
function and passing the filename. The function returns the model with the same architecture and weights.
In this case, you load the model, summarize the architecture, and evaluate it on the same dataset to confirm the weights and architecture are the same.
# load and evaluate a saved model
from numpy import loadtxt
from tensorflow.keras.models import load_model
# load model
model = load_model('model.h5')
# summarize model.
model.summary()
# load dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# evaluate the model
score = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
Running the example first loads the model, prints a summary of the model architecture, and then evaluates the loaded model on the same dataset.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
The model achieves the same accuracy score, which in this case is 77%.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 12) 108
_________________________________________________________________
dense_2 (Dense) (None, 8) 104
_________________________________________________________________
dense_3 (Dense) (None, 1) 9
=================================================================
Total params: 221
Trainable params: 221
Non-trainable params: 0
_________________________________________________________________
acc: 77.73%
While saving and loading a Keras model using HDF5 format is the recommended way, TensorFlow supports yet another format, the protocol buffer. It is considered faster to save and load a protocol buffer format, but doing so will produce multiple files. The syntax is the same, except that you do not need to provide the .h5
extension to the filename:
# save model and architecture to single file
model.save("model")
# ... later
# load model
model = load_model('model')
# print summary
model.summary()
These will create a directory βmodelβ with the following files:
model/
|-- assets/
|-- keras_metadata.pb
|-- saved_model.pb
`-- variables/
|-- variables.data-00000-of-00001
`-- variables.index
This is also the format used to save a model in TensorFlow v1.x. You may encounter this when you download a pre-trained model from TensorFlow Hub.
Original article sourced at: https://machinelearningmastery.com
1670808804
In this Keras article, we will learn about how to create a multilayer Perceptron neural network model with Keras. The Keras Python library for deep learning focuses on creating models as a sequence of layers.
In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
The focus of the Keras library is a model.
The simplest model is defined in the Sequential class, which is a linear stack of Layers.
You can create a Sequential model and define all the layers in the constructor; for example:
from tensorflow.keras.models import Sequential
model = Sequential(...)
A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(...)
model.add(...)
model.add(...)
The first layer in your model must specify the shape of the input.
This is the number of input attributes defined by the input_shape
argument. This argument expects a tuple.
For example, you can define input in terms of 8 inputs for a Dense
type layer as follows:
Dense(16, input_shape=(8,))
Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.
The type of initialization used for a layer is specified in the kernel_initializer
argument.
Some common types of layer initialization include:
random_uniform
: Weights are initialized to small uniformly random values between -0.05 and 0.05.random_normal
: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).zeros
: All weights are set to zero values.You can see a full list of the initialization techniques supported on the Usage of initializations page.
Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.
You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.
You can see a full list of activation functions supported by Keras on the Usage of activations page.
Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.
There are a large number of core layer types for standard neural networks.
Some common and useful layer types you can choose from are:
You can learn about the full list of core Keras layers on the Core Layers page.
Once you have defined your model, it needs to be compiled.
This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.
You compile your model using the compile()
function, and it accepts three important attributes:
model.compile(optimizer=..., loss=..., metrics=...)
The optimizer is the search technique used to update weights in your model.
You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:
from tensorflow.keras.optimizers import SGD
sgd = SGD(...)
model.compile(optimizer=sgd)
You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:
model.compile(optimizer='sgd')
Some popular gradient descent optimizers you might want to choose from include:
You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.
You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruderβs post, An overview of gradient descent optimization algorithms.
The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.
You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:
You can learn more about the loss functions supported by Keras on the Losses page.
Metrics are evaluated by the model during training.
Only one metric is supported at the moment, and that is accuracy.
The model is trained on NumPy arrays using the fit() function; for example:
model.fit(X, y, epochs=..., batch_size=...)
Training both specifies the number of epochs to train on and the batch size.
epochs
) refer to the number of times the model is exposed to the training dataset.batch_size
) is the number of training instances shown to the model before a weight update is performed.The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.
Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.
Once you have trained your model, you can use it to make predictions on test data or new data.
There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:
For example, if you provided a batch of data X
and the expected output y
, you can use evaluate()
to calculate the loss metric (the one you defined with compile()
before). But for a batch of new data X
, you can obtain the network output with predict()
. It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use numpy.argmax()
to convert the softmax vector into class labels.
Once you are happy with your model, you can finalize it.
You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:
model.summary()
You can also retrieve a summary of the model configuration using the get_config() function:
model.get_config()
Finally, you can create an image of your model structure directly:
from tensorflow.keras.utils import plot_model
plot(model, to_file='model.png')
Original article sourced at: https://machinelearningmastery.com
1670670960
In this Keras article, we will learn about how to create a multilayer Perceptron neural network model with Keras. The Keras Python library for deep learning focuses on creating models as a sequence of layers.
In this post, you will discover the simple components you can use to create neural networks and simple deep learning models using Keras from TensorFlow.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
The focus of the Keras library is a model.
The simplest model is defined in the Sequential class, which is a linear stack of Layers.
You can create a Sequential model and define all the layers in the constructor; for example:
from tensorflow.keras.models import Sequential
model = Sequential(...)
A more useful idiom is to create a Sequential model and add your layers in the order of the computation you wish to perform; for example:
from tensorflow.keras.models import Sequential
model = Sequential()
model.add(...)
model.add(...)
model.add(...)
The first layer in your model must specify the shape of the input.
This is the number of input attributes defined by the input_shape
argument. This argument expects a tuple.
For example, you can define input in terms of 8 inputs for a Dense
type layer as follows:
Dense(16, input_shape=(8,))
Layers of different types have a few properties in common, specifically their method of weight initialization and activation functions.
The type of initialization used for a layer is specified in the kernel_initializer
argument.
Some common types of layer initialization include:
random_uniform
: Weights are initialized to small uniformly random values between -0.05 and 0.05.random_normal
: Weights are initialized to small Gaussian random values (zero mean and standard deviation of 0.05).zeros
: All weights are set to zero values.You can see a full list of the initialization techniques supported on the Usage of initializations page.
Keras supports a range of standard neuron activation functions, such as softmax, rectified linear (relu), tanh, and sigmoid.
You typically specify the type of activation function used by a layer in the activation argument, which takes a string value.
You can see a full list of activation functions supported by Keras on the Usage of activations page.
Interestingly, you can also create an Activation object and add it directly to your model after your layer to apply that activation to the output of the layer.
There are a large number of core layer types for standard neural networks.
Some common and useful layer types you can choose from are:
You can learn about the full list of core Keras layers on the Core Layers page.
Once you have defined your model, it needs to be compiled.
This creates the efficient structures used by TensorFlow in order to efficiently execute your model during training. Specifically, TensorFlow converts your model into a graph so the training can be carried out efficiently.
You compile your model using the compile()
function, and it accepts three important attributes:
model.compile(optimizer=..., loss=..., metrics=...)
The optimizer is the search technique used to update weights in your model.
You can create an optimizer object and pass it to the compile function via the optimizer argument. This allows you to configure the optimization procedure with its own arguments, such as learning rate. For example:
from tensorflow.keras.optimizers import SGD
sgd = SGD(...)
model.compile(optimizer=sgd)
You can also use the default parameters of the optimizer by specifying the name of the optimizer to the optimizer argument. For example:
model.compile(optimizer='sgd')
Some popular gradient descent optimizers you might want to choose from include:
You can learn about all of the optimizers supported by Keras on the Usage of optimizers page.
You can learn more about different gradient descent methods in the Gradient descent optimization algorithms section of Sebastian Ruderβs post, An overview of gradient descent optimization algorithms.
The loss function, also called the objective function, is the evaluation of the model used by the optimizer to navigate the weight space.
You can specify the name of the loss function to use in the compile function by the loss argument. Some common examples include:
You can learn more about the loss functions supported by Keras on the Losses page.
Metrics are evaluated by the model during training.
Only one metric is supported at the moment, and that is accuracy.
The model is trained on NumPy arrays using the fit() function; for example:
model.fit(X, y, epochs=..., batch_size=...)
Training both specifies the number of epochs to train on and the batch size.
epochs
) refer to the number of times the model is exposed to the training dataset.batch_size
) is the number of training instances shown to the model before a weight update is performed.The fit function also allows for some basic evaluation of the model during training. You can set the validation_split value to hold back a fraction of the training dataset for validation to be evaluated in each epoch or provide a validation_data tuple of (X, y) data to evaluate.
Fitting the model returns a history object with details and metrics calculated for the model in each epoch. This can be used for graphing model performance.
Once you have trained your model, you can use it to make predictions on test data or new data.
There are a number of different output types you can calculate from your trained model, each calculated using a different function call on your model object. For example:
For example, if you provided a batch of data X
and the expected output y
, you can use evaluate()
to calculate the loss metric (the one you defined with compile()
before). But for a batch of new data X
, you can obtain the network output with predict()
. It may not be the output you want, but it will be the output of your network. For example, a classification problem will probably output a softmax vector for each sample. You will need to use numpy.argmax()
to convert the softmax vector into class labels.
Once you are happy with your model, you can finalize it.
You may wish to output a summary of your model. For example, you can display a summary of a model by calling the summary function:
model.summary()
You can also retrieve a summary of the model configuration using the get_config() function:
model.get_config()
Finally, you can create an image of your model structure directly:
from tensorflow.keras.utils import plot_model
plot(model, to_file='model.png')
You can learn more about how to create a simple neural network and deep learning models in Keras using the following resources:
In this post, you discovered the Keras API that you can use to create artificial neural networks and deep learning models.
Original article sourced at: https://machinelearningmastery.com
1670586780
In this Keras article, we will learn about how to Evaluate the Performance of Deep Learning Models in Keras. Keras is an easy-to-use and powerful Python library for deep learning.
There are a lot of decisions to make when designing and configuring your deep learning models. Most of these decisions must be resolved empirically through trial and error and by evaluating them on real data.
As such, it is critically important to have a robust way to evaluate the performance of your neural networks and deep learning models.
In this post, you will discover a few ways to evaluate model performance using Keras.
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code
You must make a myriad of decisions when designing and configuring your deep learning models.
Many of these decisions can be resolved by copying the structure of other peopleβs networks and using heuristics. Ultimately, the best technique is to actually design small experiments and empirically evaluate problems using real data.
This includes high-level decisions like the number, size, and type of layers in your network. It also includes the lower-level decisions like the choice of the loss function, activation functions, optimization procedure, and the number of epochs.
Deep learning is often used on problems that have very large datasets. That is tens of thousands or hundreds of thousands of instances.
As such, you need to have a robust test harness that allows you to estimate the performance of a given configuration on unseen data and reliably compare the performance to other configurations.
The large amount of data and the complexity of the models require very long training times.
As such, it is typical to separate data into training and test datasets or training and validation datasets.
Keras provides two convenient ways of evaluating your deep learning algorithms this way:
Keras can separate a portion of your training data into a validation dataset and evaluate the performance of your model on that validation dataset in each epoch.
You can do this by setting the validation_split argument on the fit() function to a percentage of the size of your training dataset.
For example, a reasonable value might be 0.2 or 0.33 for 20% or 33% of your training data held back for validation.
The example below demonstrates the use of an automatic validation dataset on a small binary classification problem. All examples in this post use the Pima Indians onset of diabetes dataset. You can download it from the UCI Machine Learning Repository and save the data file in your current working directory with the filename pima-indians-diabetes.csv (update: download from here).
# MLP with automatic validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy
# fix random seed for reproducibility
numpy.random.seed(7)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10)
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example, you can see that the verbose output on each epoch shows the loss and accuracy on both the training dataset and the validation dataset.
...
Epoch 145/150
514/514 [==============================] - 0s - loss: 0.5252 - acc: 0.7335 - val_loss: 0.5489 - val_acc: 0.7244
Epoch 146/150
514/514 [==============================] - 0s - loss: 0.5198 - acc: 0.7296 - val_loss: 0.5918 - val_acc: 0.7244
Epoch 147/150
514/514 [==============================] - 0s - loss: 0.5175 - acc: 0.7335 - val_loss: 0.5365 - val_acc: 0.7441
Epoch 148/150
514/514 [==============================] - 0s - loss: 0.5219 - acc: 0.7354 - val_loss: 0.5414 - val_acc: 0.7520
Epoch 149/150
514/514 [==============================] - 0s - loss: 0.5089 - acc: 0.7432 - val_loss: 0.5417 - val_acc: 0.7520
Epoch 150/150
514/514 [==============================] - 0s - loss: 0.5148 - acc: 0.7490 - val_loss: 0.5549 - val_acc: 0.7520
Keras also allows you to manually specify the dataset to use for validation during training.
In this example, you can use the handy train_test_split() function from the Python scikit-learn machine learning library to separate your data into a training and test dataset. Use 67% for training and the remaining 33% of the data for validation.
The validation dataset can be specified to the fit()
function in Keras by the validation_data
argument. It takes a tuple of the input and output datasets.
# MLP with manual validation set
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import train_test_split
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# split into 67% for train and 33% for test
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=seed)
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test,y_test), epochs=150, batch_size=10)
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Like before, running the example provides a verbose output of training that includes the loss and accuracy of the model on both the training and validation datasets for each epoch.
...
Epoch 145/150
514/514 [==============================] - 0s - loss: 0.4847 - acc: 0.7704 - val_loss: 0.5668 - val_acc: 0.7323
Epoch 146/150
514/514 [==============================] - 0s - loss: 0.4853 - acc: 0.7549 - val_loss: 0.5768 - val_acc: 0.7087
Epoch 147/150
514/514 [==============================] - 0s - loss: 0.4864 - acc: 0.7743 - val_loss: 0.5604 - val_acc: 0.7244
Epoch 148/150
514/514 [==============================] - 0s - loss: 0.4831 - acc: 0.7665 - val_loss: 0.5589 - val_acc: 0.7126
Epoch 149/150
514/514 [==============================] - 0s - loss: 0.4961 - acc: 0.7782 - val_loss: 0.5663 - val_acc: 0.7126
Epoch 150/150
514/514 [==============================] - 0s - loss: 0.4967 - acc: 0.7588 - val_loss: 0.5810 - val_acc: 0.6929
The gold standard for machine learning model evaluation is k-fold cross validation.
It provides a robust estimate of the performance of a model on unseen data. It does this by splitting the training dataset into k subsets, taking turns training models on all subsets except one, which is held out, and evaluating model performance on the held-out validation dataset. The process is repeated until all subsets are given an opportunity to be the held-out validation set. The performance measure is then averaged across all models that are created.
It is important to understand that cross validation means estimating a model design (e.g., 3-layer vs. 4-layer neural network) rather than a specific fitted model. You do not want to use a specific dataset to fit the models and compare the result since this may be due to that particular dataset fitting better on one model design. Instead, you want to use multiple datasets to fit, resulting in multiple fitted models of the same design, taking the average performance measure for comparison.
Cross validation is often not used for evaluating deep learning models because of the greater computational expense. For example, k-fold cross validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, significantly adding to the evaluation time of a model.
Nevertheless, when the problem is small enough or if you have sufficient computing resources, k-fold cross validation can give you a less-biased estimate of the performance of your model.
In the example below, you will use the handy StratifiedKFold class from the scikit-learn Python machine learning library to split the training dataset into 10 folds. The folds are stratified, meaning that the algorithm attempts to balance the number of instances of each class in each fold.
The example creates and evaluates 10 models using the 10 splits of the data and collects all the scores. The verbose output for each epoch is turned off by passing verbose=0
to the fit()
and evaluate()
functions on the model.
The performance is printed for each model, and it is stored. The average and standard deviation of the model performance are then printed at the end of the run to provide a robust estimate of model accuracy.
# MLP for Pima Indians Dataset with 10-fold cross validation
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from sklearn.model_selection import StratifiedKFold
import numpy as np
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)
# load pima indians dataset
dataset = np.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define 10-fold cross validation test harness
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, Y):
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X[train], Y[train], epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X[test], Y[test], verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example will take less than a minute and will produce the following output:
acc: 77.92%
acc: 68.83%
acc: 72.73%
acc: 64.94%
acc: 77.92%
acc: 35.06%
acc: 74.03%
acc: 68.83%
acc: 34.21%
acc: 72.37%
64.68% (+/- 15.50%)
In this post, you discovered the importance of having a robust way to estimate the performance of your deep learning models on unseen data.
You discovered three ways that you can estimate the performance of your deep learning models in Python using the Keras library:
Do you have any questions about deep learning with Keras or this post? Ask your question in the comments, and I will do my best to answer it.
Original article sourced at: https://machinelearningmastery.com
1670577756
In this Keras article, we will learn about Top 3 Ways To Build Machine Learning Models in Keras. If youβve looked at Keras models on Github, youβve probably noticed that there are some different ways to create models in Keras. Thereβs the Sequential model, which allows you to define an entire model in a single line, usually with some line breaks for readability. Then, thereβs the functional interface that allows for more complicated model architectures, and thereβs also the Model subclass which helps reusability. This article will explore the different ways to create models in Keras, along with their advantages and drawbacks. This will equip you with the knowledge you need to create your own machine learning models in Keras.
After you complete this tutorial, you will learn:
Letβs get started!
This tutorial is split into three parts, covering the different ways to build machine learning models in Keras:
The Sequential Model is just as the name implies. It consists of a sequence of layers, one after the other. From the Keras documentation,
βA Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.β
It is a simple, easy-to-use way to start building your Keras model. To start, import Tensorflow and then the Sequential model:
import tensorflow as tf
from tensorflow.keras import Sequential
Then, you can start building your machine learning model by stacking various layers together. For this example, letβs build a LeNet5 model with the classic CIFAR-10 image dataset as the input:
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
model = Sequential([
Input(shape=(32,32,3,)),
Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu"),
MaxPool2D(pool_size=(2,2)),
Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu"),
MaxPool2D(pool_size=(2, 2)),
Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu"),
Flatten(),
Dense(units=84, activation="relu"),
Dense(units=10, activation="softmax"),
])
print (model.summary())
Notice that you are just passing in an array of the layers you want your model to contain into the Sequential model constructor. Looking at the model.summary(), you can see the modelβs architecture.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_3 (Conv2D) (None, 32, 32, 6) 456
max_pooling2d_2 (MaxPooling (None, 16, 16, 6) 0
2D)
conv2d_4 (Conv2D) (None, 16, 16, 16) 2416
max_pooling2d_3 (MaxPooling (None, 8, 8, 16) 0
2D)
conv2d_5 (Conv2D) (None, 8, 8, 120) 48120
flatten_1 (Flatten) (None, 7680) 0
dense_2 (Dense) (None, 84) 645204
dense_3 (Dense) (None, 10) 850
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________
And just to test out the model, letβs go ahead and load the CIFAR-10 dataset and run model.compile and model.fit:
from tensorflow import keras
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))
This gives us this output:
Epoch 1/10
196/196 [==============================] - 13s 10ms/step - loss: 2.7669 - acc: 0.3648 - val_loss: 1.4869 - val_acc: 0.4713
Epoch 2/10
196/196 [==============================] - 2s 8ms/step - loss: 1.3883 - acc: 0.5097 - val_loss: 1.3654 - val_acc: 0.5205
Epoch 3/10
196/196 [==============================] - 2s 8ms/step - loss: 1.2239 - acc: 0.5694 - val_loss: 1.2908 - val_acc: 0.5472
Epoch 4/10
196/196 [==============================] - 2s 8ms/step - loss: 1.1020 - acc: 0.6120 - val_loss: 1.2640 - val_acc: 0.5622
Epoch 5/10
196/196 [==============================] - 2s 8ms/step - loss: 0.9931 - acc: 0.6498 - val_loss: 1.2850 - val_acc: 0.5555
Epoch 6/10
196/196 [==============================] - 2s 9ms/step - loss: 0.8888 - acc: 0.6903 - val_loss: 1.3150 - val_acc: 0.5646
Epoch 7/10
196/196 [==============================] - 2s 8ms/step - loss: 0.7882 - acc: 0.7229 - val_loss: 1.4273 - val_acc: 0.5426
Epoch 8/10
196/196 [==============================] - 2s 8ms/step - loss: 0.6915 - acc: 0.7582 - val_loss: 1.4574 - val_acc: 0.5604
Epoch 9/10
196/196 [==============================] - 2s 8ms/step - loss: 0.5934 - acc: 0.7931 - val_loss: 1.5304 - val_acc: 0.5631
Epoch 10/10
196/196 [==============================] - 2s 8ms/step - loss: 0.5113 - acc: 0.8214 - val_loss: 1.6355 - val_acc: 0.5512
Thatβs pretty good for a first pass at a model. Putting the code for LeNet5 using a Sequential model together, you have:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
model = Sequential([
Input(shape=(32,32,3,)),
Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu"),
MaxPool2D(pool_size=(2,2)),
Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu"),
MaxPool2D(pool_size=(2, 2)),
Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu"),
Flatten(),
Dense(units=84, activation="relu"),
Dense(units=10, activation="softmax"),
])
print (model.summary())
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))
Now, letβs explore what the other ways of constructing Keras models can do, starting with the functional interface!
The next method of constructing Keras models you will explore uses Kerasβs functional interface. The functional interface uses the layers as functions instead, taking in a Tensor and outputting a Tensor as well. The functional interface is a more flexible way of representing a Keras model as you are not restricted only to sequential models with layers stacked on top of one another. Instead, you can build models that branch into multiple paths, have multiple inputs, etc.
Consider an Add
layer that takes inputs from two or more paths and adds the tensors together.
Add layer with two inputs
Since this cannot be represented as a linear stack of layers due to the multiple inputs, you are unable to define it using a Sequential object. Hereβs where Kerasβs functional interface comes in. You can define an Add layer with two input tensors as such:
from tensorflow.keras.layers import Add
add_layer = Add()([layer1, layer2])
Now that youβve seen a quick example of the functional interface, letβs take a look at what the LeNet5 model that you defined by instantiating a Sequential class would look like using a functional interface.
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
from tensorflow.keras.models import Model
input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")(input_layer)
x = MaxPool2D(pool_size=(2,2))(x)
x = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")(x)
x = MaxPool2D(pool_size=(2, 2))(x)
x = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")(x)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
model = Model(inputs=input_layer, outputs=x)
print(model.summary())
And looking at the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 32, 32, 3)] 0
conv2d_6 (Conv2D) (None, 32, 32, 6) 456
max_pooling2d_2 (MaxPooling (None, 16, 16, 6) 0
2D)
conv2d_7 (Conv2D) (None, 16, 16, 16) 2416
max_pooling2d_3 (MaxPooling (None, 8, 8, 16) 0
2D)
conv2d_8 (Conv2D) (None, 8, 8, 120) 48120
flatten_2 (Flatten) (None, 7680) 0
dense_4 (Dense) (None, 84) 645204
dense_5 (Dense) (None, 10) 850
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________
As you can see, the model architecture is the same for both LeNet5 models you implemented using the functional interface or the Sequential class.
Now that youβve seen how to use Kerasβs functional interface, letβs look at a model architecture that you can implement using the functional interface but not with the Sequential class. For this example, look at the residual block introduced in ResNet. Visually, the residual block looks like this:
Residual block, source: https://arxiv.org/pdf/1512.03385.pdf
You can see that a model defined using the Sequential class would be unable to construct such a block due to the skip connection, which prevents this block from being represented as a simple stack of layers. Using the functional interface is one way you can define a ResNet block:
def residual_block(x, filters):
# store the input tensor to be added later as the identity
identity = x
# change the strides to do like pooling layer (need to see whether we connect before or after this layer though)
x = Conv2D(filters = filters, kernel_size=(3, 3), strides = (1, 1), padding="same")(x)
x = BatchNormalization()(x)
x = relu(x)
x = Conv2D(filters = filters, kernel_size=(3, 3), padding="same")(x)
x = BatchNormalization()(x)
x = Add()([identity, x])
x = relu(x)
return x
Then, you can build a simple network using these residual blocks using the functional interface:
input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = residual_block(x, 32)
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 64)
x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 128)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
model = Model(inputs=input_layer, outputs = x)
print(model.summary())
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))
Running this code and looking at the model summary and training results:
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0 []
conv2d (Conv2D) (None, 32, 32, 32) 896 ['input_1[0][0]']
conv2d_1 (Conv2D) (None, 32, 32, 32) 9248 ['conv2d[0][0]']
batch_normalization (BatchNorm (None, 32, 32, 32) 128 ['conv2d_1[0][0]']
alization)
tf.nn.relu (TFOpLambda) (None, 32, 32, 32) 0
['batch_normalization[0][0]']
conv2d_2 (Conv2D) (None, 32, 32, 32) 9248 ['tf.nn.relu[0][0]']
batch_normalization_1 (BatchNo (None, 32, 32, 32) 128 ['conv2d_2[0][0]']
rmalization)
add (Add) (None, 32, 32, 32) 0 ['conv2d[0][0]',
'batch_normalization_1[0][0]']
tf.nn.relu_1 (TFOpLambda) (None, 32, 32, 32) 0 ['add[0][0]']
conv2d_3 (Conv2D) (None, 16, 16, 64) 18496 ['tf.nn.relu_1[0][0]']
conv2d_4 (Conv2D) (None, 16, 16, 64) 36928 ['conv2d_3[0][0]']
batch_normalization_2 (BatchNo (None, 16, 16, 64) 256 ['conv2d_4[0][0]']
rmalization)
tf.nn.relu_2 (TFOpLambda) (None, 16, 16, 64) 0
['batch_normalization_2[0][0]']
conv2d_5 (Conv2D) (None, 16, 16, 64) 36928 ['tf.nn.relu_2[0][0]']
batch_normalization_3 (BatchNo (None, 16, 16, 64) 256 ['conv2d_5[0][0]']
rmalization)
add_1 (Add) (None, 16, 16, 64) 0 ['conv2d_3[0][0]',
'batch_normalization_3[0][0]']
tf.nn.relu_3 (TFOpLambda) (None, 16, 16, 64) 0 ['add_1[0][0]']
conv2d_6 (Conv2D) (None, 8, 8, 128) 73856 ['tf.nn.relu_3[0][0]']
conv2d_7 (Conv2D) (None, 8, 8, 128) 147584 ['conv2d_6[0][0]']
batch_normalization_4 (BatchNo (None, 8, 8, 128) 512 ['conv2d_7[0][0]']
rmalization)
tf.nn.relu_4 (TFOpLambda) (None, 8, 8, 128) 0
['batch_normalization_4[0][0]']
conv2d_8 (Conv2D) (None, 8, 8, 128) 147584 ['tf.nn.relu_4[0][0]']
batch_normalization_5 (BatchNo (None, 8, 8, 128) 512 ['conv2d_8[0][0]']
rmalization)
add_2 (Add) (None, 8, 8, 128) 0 ['conv2d_6[0][0]',
'batch_normalization_5[0][0]']
tf.nn.relu_5 (TFOpLambda) (None, 8, 8, 128) 0 ['add_2[0][0]']
flatten (Flatten) (None, 8192) 0 ['tf.nn.relu_5[0][0]']
dense (Dense) (None, 84) 688212 ['flatten[0][0]']
dense_1 (Dense) (None, 10) 850 ['dense[0][0]']
==================================================================================================
Total params: 1,171,622
Trainable params: 1,170,726
Non-trainable params: 896
__________________________________________________________________________________________________
None
Epoch 1/10
196/196 [==============================] - 21s 46ms/step - loss: 3.4463
acc: 0.3635 - val_loss: 1.8015 - val_acc: 0.3459
Epoch 2/10
196/196 [==============================] - 8s 43ms/step - loss: 1.3267 - acc: 0.5200 - val_loss: 1.3895 - val_acc: 0.5069
Epoch 3/10
196/196 [==============================] - 8s 43ms/step - loss: 1.1095 - acc: 0.6062 - val_loss: 1.2008 - val_acc: 0.5651
Epoch 4/10
196/196 [==============================] - 9s 44ms/step - loss: 0.9618 - acc: 0.6585 - val_loss: 1.5411 - val_acc: 0.5226
Epoch 5/10
196/196 [==============================] - 9s 44ms/step - loss: 0.8656 - acc: 0.6968 - val_loss: 1.1012 - val_acc: 0.6234
Epoch 6/10
196/196 [==============================] - 8s 43ms/step - loss: 0.7622 - acc: 0.7361 - val_loss: 1.1355 - val_acc: 0.6168
Epoch 7/10
196/196 [==============================] - 9s 44ms/step - loss: 0.6801 - acc: 0.7602 - val_loss: 1.1561 - val_acc: 0.6187
Epoch 8/10
196/196 [==============================] - 8s 43ms/step - loss: 0.6106 - acc: 0.7905 - val_loss: 1.1100 - val_acc: 0.6401
Epoch 9/10
196/196 [==============================] - 9s 43ms/step - loss: 0.5367 - acc: 0.8146 - val_loss: 1.2989 - val_acc: 0.6058
Epoch 10/10
196/196 [==============================] - 9s 47ms/step - loss: 0.4776 - acc: 0.8348 - val_loss: 1.0098 - val_acc: 0.6757
And combining the code for our simple network using residual blocks:
import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Conv2D, BatchNormalization, Add, MaxPool2D, Flatten, Dense
from keras.activations import relu
from tensorflow.keras.models import Model
def residual_block(x, filters):
# store the input tensor to be added later as the identity
identity = x
# change the strides to do like pooling layer (need to see whether we connect before or after this layer though)
x = Conv2D(filters = filters, kernel_size=(3, 3), strides = (1, 1), padding="same")(x)
x = BatchNormalization()(x)
x = relu(x)
x = Conv2D(filters = filters, kernel_size=(3, 3), padding="same")(x)
x = BatchNormalization()(x)
x = Add()([identity, x])
x = relu(x)
return x
(trainX, trainY), (testX, testY) = keras.datasets.cifar10.load_data()
input_layer = Input(shape=(32,32,3,))
x = Conv2D(filters=32, kernel_size=(3, 3), padding="same", activation="relu")(input_layer)
x = residual_block(x, 32)
x = Conv2D(filters=64, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 64)
x = Conv2D(filters=128, kernel_size=(3, 3), strides=(2, 2), padding="same", activation="relu")(x)
x = residual_block(x, 128)
x = Flatten()(x)
x = Dense(units=84, activation="relu")(x)
x = Dense(units=10, activation="softmax")(x)
model = Model(inputs=input_layer, outputs = x)
print(model.summary())
model.compile(optimizer="adam", loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics="acc")
history = model.fit(x=trainX, y=trainY, batch_size=256, epochs=10, validation_data=(testX, testY))
Keras also provides an object-oriented approach to creating models, which helps with reusability and allows you to represent the models you want to create as classes. This representation might be more intuitive since you can think about models as a set of layers strung together to form your network.
To begin subclassing keras.Model, you first need to import it:
from tensorflow.keras.models import Model
Then, you can start subclassing keras.Model. First, you need to build the layers that you want to use in your method calls since you only want to instantiate these layers once instead of each time you call your model. To keep in line with previous examples, letβs build a LeNet5 model here as well.
class LeNet5(tf.keras.Model):
def __init__(self):
super(LeNet5, self).__init__()
#creating layers in initializer
self.conv1 = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")
self.max_pool2x2 = MaxPool2D(pool_size=(2,2))
self.conv2 = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")
self.conv3 = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")
self.flatten = Flatten()
self.fc2 = Dense(units=84, activation="relu")
self.fc3 = Dense(units=10, activation="softmax")
Then, override the call method to define what happens when the model is called. You override it with your model, which uses the layers you have built in the initializer.`
def call(self, input_tensor):
# don't create layers here, need to create the layers in initializer,
# otherwise you will get the tf.Variable can only be created once error
conv1 = self.conv1(input_tensor)
maxpool1 = self.max_pool2x2(conv1)
conv2 = self.conv2(maxpool1)
maxpool2 = self.max_pool2x2(conv2)
conv3 = self.conv3(maxpool2)
flatten = self.flatten(conv3)
fc2 = self.fc2(flatten)
fc3 = self.fc3(fc2)
return fc3
It is important to have all the layers created at the class constructor, not inside the call()
method. This is because the call()
method will be invoked multiple times with different input tensors. But you want to use the same layer objects in each call to optimize their weight. You can then instantiate your new LeNet5 class and use it as part of a model:
input_layer = Input(shape=(32,32,3,))
x = LeNet5()(input_layer)
model = Model(inputs=input_layer, outputs=x)
print(model.summary(expand_nested=True))
And you can see that the model has the same number of parameters as the previous two versions of LeNet5 that were built previously and has the same structure within it.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 32, 32, 3)] 0
le_net5 (LeNet5) (None, 10) 697046
|Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―|
| conv2d (Conv2D) multiple 456 |
| |
| max_pooling2d (MaxPooling2D multiple 0 |
| ) |
| |
| conv2d_1 (Conv2D) multiple 2416 |
| |
| conv2d_2 (Conv2D) multiple 48120 |
| |
| flatten (Flatten) multiple 0 |
| |
| dense (Dense) multiple 645204 |
| |
| dense_1 (Dense) multiple 850 |
Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―Β―
=================================================================
Total params: 697,046
Trainable params: 697,046
Non-trainable params: 0
_________________________________________________________________
Combining all the code to create your LeNet5 subclass of keras.Model:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input, Flatten, Conv2D, MaxPool2D
from tensorflow.keras.models import Model
class LeNet5(tf.keras.Model):
def __init__(self):
super(LeNet5, self).__init__()
#creating layers in initializer
self.conv1 = Conv2D(filters=6, kernel_size=(5,5), padding="same", activation="relu")
self.max_pool2x2 = MaxPool2D(pool_size=(2,2))
self.conv2 = Conv2D(filters=16, kernel_size=(5,5), padding="same", activation="relu")
self.conv3 = Conv2D(filters=120, kernel_size=(5,5), padding="same", activation="relu")
self.flatten = Flatten()
self.fc2 = Dense(units=84, activation="relu")
self.fc3=Dense(units=10, activation="softmax")
def call(self, input_tensor):
#don't add layers here, need to create the layers in initializer, otherwise you will get the tf.Variable can only be created once error
x = self.conv1(input_tensor)
x = self.max_pool2x2(x)
x = self.conv2(x)
x = self.max_pool2x2(x)
x = self.conv3(x)
x = self.flatten(x)
x = self.fc2(x)
x = self.fc3(x)
return x
input_layer = Input(shape=(32,32,3,))
x = LeNet5()(input_layer)
model = Model(inputs=input_layer, outputs=x)
print(model.summary(expand_nested=True))
In this post, you have seen three different ways to create models in Keras. In particular, this includes using the Sequential class, functional interface, and subclassing keras.Model. You have also seen examples of the same LeNet5 model being built using the different methods and a use case that can be done using the functional interface but not with the Sequential class.
Original article sourced at: https://machinelearningmastery.com
1670555280
In this Keras article, we will learn about Implementing Dropout Regularization in Deep Learning Models with Keras. Dropout is a simple and powerful regularization technique for neural networks and deep learning models.
In this post, you will discover the Dropout regularization technique and how to apply it to your models in Python with Keras.
After reading this post, you will know:
Dropout is a regularization technique for neural network models proposed by Srivastava et al. in their 2014 paper βDropout: A Simple Way to Prevent Neural Networks from Overfittingβ (download the PDF).
Dropout is a technique where randomly selected neurons are ignored during training. They are βdropped outβ randomly. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass.
As a neural network learns, neuron weights settle into their context within the network. Weights of neurons are tuned for specific features, providing some specialization. Neighboring neurons come to rely on this specialization, which, if taken too far, can result in a fragile model too specialized for the training data. This reliance on context for a neuron during training is referred to as complex co-adaptations.
You can imagine that if neurons are randomly dropped out of the network during training, other neurons will have to step in and handle the representation required to make predictions for the missing neurons. This is believed to result in multiple independent internal representations being learned by the network.
The effect is that the network becomes less sensitive to the specific weights of neurons. This, in turn, results in a network capable of better generalization and less likely to overfit the training data.
Dropout is easily implemented by randomly selecting nodes to be dropped out with a given probability (e.g., 20%) in each weight update cycle. This is how Dropout is implemented in Keras. Dropout is only used during the training of a model and is not used when evaluating the skill of the model.
Next, letβs explore a few different ways of using Dropout in Keras.
The examples will use the Sonar dataset. This is a binary classification problem that aims to correctly identify rocks and mock-mines from sonar chirp returns. It is a good test dataset for neural networks because all the input values are numerical and have the same scale.
The dataset can be downloaded from the UCI Machine Learning repository. You can place the sonar dataset in your current working directory with the file name sonar.csv.
You will evaluate the developed models using scikit-learn with 10-fold cross validation in order to tease out differences in the results better.
There are 60 input values and a single output value. The input values are standardized before being used in the network. The baseline neural network model has two hidden layers, the first with 60 units and the second with 30. Stochastic gradient descent is used to train the model with a relatively low learning rate and momentum.
The full baseline model is listed below:
# Baseline Model on the Sonar Dataset
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = SGD(learning_rate=0.01, momentum=0.8)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example generates an estimated classification accuracy of 86%.
Baseline: 86.04% (4.58%)
Dropout can be applied to input neurons called the visible layer.
In the example below, a new Dropout layer between the input (or visible layer) and the first hidden layer was added. The dropout rate is set to 20%, meaning one in five inputs will be randomly excluded from each update cycle.
Additionally, as recommended in the original paper on Dropout, a constraint is imposed on the weights for each hidden layer, ensuring that the maximum norm of the weights does not exceed a value of 3. This is done by setting the kernel_constraint argument on the Dense class when constructing the layers.
The learning rate was lifted by one order of magnitude, and the momentum was increased to 0.9. These increases in the learning rate were also recommended in the original Dropout paper.
Continuing from the baseline example above, the code below exercises the same network with input dropout:
# Example of Dropout on the Sonar Dataset: Visible Layer
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# dropout in the input layer with weight constraint
def create_model():
# create model
model = Sequential()
model.add(Dropout(0.2, input_shape=(60,)))
model.add(Dense(60, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = SGD(learning_rate=0.1, momentum=0.9)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Visible: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running the example provides a slight drop in classification accuracy, at least on a single test run.
Visible: 83.52% (7.68%)
Dropout can be applied to hidden neurons in the body of your network model.
In the example below, Dropout is applied between the two hidden layers and between the last hidden layer and the output layer. Again a dropout rate of 20% is used as is a weight constraint on those layers.
# Example of Dropout on the Sonar Dataset: Hidden Layer
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.constraints import MaxNorm
from tensorflow.keras.optimizers import SGD
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# dropout in hidden layers with weight constraint
def create_model():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(30, activation='relu', kernel_constraint=MaxNorm(3)))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
# Compile model
sgd = SGD(learning_rate=0.1, momentum=0.9)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_model, epochs=300, batch_size=16, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Hidden: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
You can see that for this problem and the chosen network configuration, using Dropout in the hidden layers did not lift performance. In fact, performance was worse than the baseline.
It is possible that additional training epochs are required or that further tuning is required to the learning rate.
Hidden: 83.59% (7.31%)
Dropout will randomly reset some of the input to zero. If you wonder what happens after you have finished training, the answer is nothing! In Keras, a layer can tell if the model is running in training mode or not. The Dropout layer will randomly reset some input only when the model runs for training. Otherwise, the Dropout layer works as a scaler to multiply all input by a factor such that the next layer will see input similar in scale. Precisely, if the dropout rate is r, the input will be scaled by a factor of 1βr.
The original paper on Dropout provides experimental results on a suite of standard machine learning problems. As a result, they provide a number of useful heuristics to consider when using Dropout in practice.
Original article sourced at: https://machinelearningmastery.com
1670550232
In this Keras article, we will learn Binary Classification with Keras Deep Learning Library. Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.
Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano.
Keras allows you to quickly and simply design and train neural networks and deep learning models.
In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.
After completing this tutorial, you will know:
The dataset you will use in this tutorial is the Sonar dataset.
This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.
You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.
It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string βMβ for mine and βRβ for rock, which will need to be converted to integers 1 and 0.
A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.
Letβs create a baseline model and result for this problem.
You will start by importing all the classes and functions you will need.
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...
Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.
...
# load dataset
dataframe = pd.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
The output variable is string values. You must convert them into integer values 0 and 1.
You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.
...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
You are now ready to create your neural network model using Keras.
You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.
To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.
Letβs start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.
The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.
Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.
Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.
...
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Baseline
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.
Baseline: 81.68% (7.26%)
This is an excellent score without doing any hard work.
It is a good practice to prepare your data before modeling.
Neural network models are especially suitable for having consistent input values, both in scale and distribution.
Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.
You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.
Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the βunseenβ test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of βunseenβ data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.
You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.
...
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example provides the results below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
You now see a small but very nice lift in the mean accuracy.
Standardized: 84.56% (5.74%)
There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.
One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.
These are good experiments to perform when tuning a neural network on your problem.
Note that there is likely a lot of redundancy in the input variables for this problem.
The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.
In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.
You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.
...
# smaller model
def create_smaller():
# create model
model = Sequential()
model.add(Dense(30, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized Smaller
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# smaller model
def create_smaller():
# create model
model = Sequential()
model.add(Dense(30, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.
Smaller: 86.04% (4.00%)
A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.
You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.
Your network now has the topology:
60 inputs -> [60 -> 30] -> 1 output
The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.
Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.
...
# larger model
def create_larger():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized Larger
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example produces the results below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.
Larger: 83.14% (4.52%)
With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?
In this post, you discovered the Keras deep Learning library in Python.
You learned how you can work through a binary classification problem step-by-step with Keras, specifically:
Do you have any questions about deep learning with Keras or this post? Ask your questions in the comments, and I will do my best to answer.
Original article sourced at: https://machinelearningmastery.com
1670471536
In this Keras article, we will learn about Binary Classification with Keras Deep Learning Library. Keras is a Python library for deep learning that wraps the efficient numerical libraries TensorFlow and Theano. Keras allows you to quickly and simply design and train neural networks and deep learning models.
Binary Classification Tutorial with the Keras Deep Learning Library
In this post, you will discover how to effectively use the Keras library in your machine learning project by working through a binary classification project step-by-step.
After completing this tutorial, you will know:
The dataset you will use in this tutorial is the Sonar dataset.
This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.
You can learn more about this dataset on the UCI Machine Learning repository. You can download the dataset for free and place it in your working directory with the filename sonar.csv.
It is a well-understood dataset. All the variables are continuous and generally in the range of 0 to 1. The output variable is a string βMβ for mine and βRβ for rock, which will need to be converted to integers 1 and 0.
A benefit of using this dataset is that it is a standard benchmark problem. This means that we have some idea of the expected skill of a good model. Using cross-validation, a neural network should be able to achieve a performance of around 84% with an upper bound on accuracy for custom models at around 88%.
Letβs create a baseline model and result for this problem.
You will start by importing all the classes and functions you will need.
import pandas as pd
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...
Now, you can load the dataset using pandas and split the columns into 60 input variables (X) and one output variable (Y). Use pandas to load the data because it easily handles strings (the output variable), whereas attempting to load the data directly using NumPy would be more difficult.
...
# load dataset
dataframe = pd.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
The output variable is string values. You must convert them into integer values 0 and 1.
You can do this using the LabelEncoder class from scikit-learn. This class will model the encoding required using the entire dataset via the fit() function, then apply the encoding to create a new output variable using the transform() function.
...
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
You are now ready to create your neural network model using Keras.
You will use scikit-learn to evaluate the model using stratified k-fold cross validation. This is a resampling technique that will provide an estimate of the performance of the model. It does this by splitting the data into k-parts and training the model on all parts except one, which is held out as a test set to evaluate the performance of the model. This process is repeated k-times, and the average score across all constructed models is used as a robust estimate of performance. It is stratified, meaning that it will look at the output values and attempt to balance the number of instances that belong to each class in the k-splits of the data.
To use Keras models with scikit-learn, you must use the KerasClassifier wrapper from the SciKeras module. This class takes a function that creates and returns our neural network model. It also takes arguments that it will pass along to the call to fit(), such as the number of epochs and the batch size.
Letβs start by defining the function that creates your baseline model. Your model will have a single, fully connected hidden layer with the same number of neurons as input variables. This is a good default starting point when creating neural networks.
The weights are initialized using a small Gaussian random number. The Rectifier activation function is used. The output layer contains a single neuron in order to make predictions. It uses the sigmoid activation function in order to produce a probability output in the range of 0 to 1 that can easily and automatically be converted to crisp class values.
Finally, you will use the logarithmic loss function (binary_crossentropy) during training, the preferred loss function for binary classification problems. The model also uses the efficient Adam optimization algorithm for gradient descent, and accuracy metrics will be collected when the model is trained.
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
Now, it is time to evaluate this model using stratified cross validation in the scikit-learn framework.
Pass the number of training epochs to the KerasClassifier, again using reasonable default values. Verbose output is also turned off, given that the model will be created ten times for the 10-fold cross validation being performed.
...
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Baseline
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# evaluate model with standardized dataset
estimator = KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(estimator, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
Running this code produces the following output showing the mean and standard deviation of the estimated accuracy of the model on unseen data.
Baseline: 81.68% (7.26%)
This is an excellent score without doing any hard work.
It is a good practice to prepare your data before modeling.
Neural network models are especially suitable for having consistent input values, both in scale and distribution.
Standardization is an effective data preparation scheme for tabular data when building neural network models. This is where the data is rescaled such that the mean value for each attribute is 0, and the standard deviation is 1. This preserves Gaussian and Gaussian-like distributions while normalizing the central tendencies for each attribute.
You can use scikit-learn to perform the standardization of your sonar dataset using the StandardScaler class.
Rather than performing the standardization on the entire dataset, it is good practice to train the standardization procedure on the training data within the pass of a cross-validation run and use the trained standardization to prepare the βunseenβ test fold. This makes standardization a step in model preparation in the cross-validation process. It prevents the algorithm from having knowledge of βunseenβ data during evaluation, knowledge that might be passed from the data preparation scheme like a crisper distribution.
You can achieve this in scikit-learn using a Pipeline. The pipeline is a wrapper that executes one or more models within a pass of the cross-validation procedure. Here, you can define a pipeline with the StandardScaler followed by your neural network model.
...
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# baseline model
def create_baseline():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# evaluate baseline model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_baseline, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Standardized: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example provides the results below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
You now see a small but very nice lift in the mean accuracy.
Standardized: 84.56% (5.74%)
There are many things to tune on a neural network, such as weight initialization, activation functions, optimization procedure, and so on.
One aspect that may have an outsized effect is the structure of the network itself, called the network topology. In this section, you will look at two experiments on the structure of the network: making it smaller and making it larger.
These are good experiments to perform when tuning a neural network on your problem.
Note that there is likely a lot of redundancy in the input variables for this problem.
The data describes the same signal from different angles. Perhaps some of those angles are more relevant than others. So you can force a type of feature extraction by the network by restricting the representational space in the first hidden layer.
In this experiment, you will take your baseline model with 60 neurons in the hidden layer and reduce it by half to 30. This will pressure the network during training to pick out the most important structure in the input data to model.
You will also standardize the data as in the previous experiment with data preparation and try to take advantage of the slight lift in performance.
...
# smaller model
def create_smaller():
# create model
model = Sequential()
model.add(Dense(30, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized Smaller
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# smaller model
def create_smaller():
# create model
model = Sequential()
model.add(Dense(30, input_shape=(60,), activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_smaller, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Smaller: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example provides the following result. You can see that you have a very slight boost in the mean estimated accuracy and an important reduction in the standard deviation (average spread) of the accuracy scores for the model.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
This is a great result because you are doing slightly better with a network half the size, which, in turn, takes half the time to train.
Smaller: 86.04% (4.00%)
A neural network topology with more layers offers more opportunities for the network to extract key features and recombine them in useful nonlinear ways.
You can easily evaluate whether adding more layers to the network improves the performance by making another small tweak to the function used to create our model. Here, you add one new layer (one line) to the network that introduces another hidden layer with 30 neurons after the first hidden layer.
Your network now has the topology:
60 inputs -> [60 -> 30] -> 1 output
The idea here is that the network is given the opportunity to model all input variables before being bottlenecked and forced to halve the representational capacity, much like you did in the experiment above with the smaller network.
Instead of squeezing the representation of the inputs themselves, you have an additional hidden layer to aid in the process.
...
# larger model
def create_larger():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
After tying this together, the complete example is listed below.
# Binary Classification with Sonar Dataset: Standardized Larger
from pandas import read_csv
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from scikeras.wrappers import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# load dataset
dataframe = read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
# larger model
def create_larger():
# create model
model = Sequential()
model.add(Dense(60, input_shape=(60,), activation='relu'))
model.add(Dense(30, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(model=create_larger, epochs=100, batch_size=5, verbose=0)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Larger: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Running this example produces the results below.
Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision. Consider running the example a few times and compare the average outcome.
You can see that you do not get a lift in the model performance. This may be statistical noise or a sign that further training is needed.
Larger: 83.14% (4.52%)
With further tuning of aspects like the optimization algorithm and the number of training epochs, it is expected that further improvements are possible. What is the best score that you can achieve on this dataset?
Original article sourced at: https://machinelearningmastery.com
1670385252
In this Keras tutorial we will learn about How to Use Image Enhancement for Deep Learning with Keras. Data preparation is required when working with neural networks and deep learning models. Increasingly, data augmentation is also required on more complex object recognition tasks.
In this post, you will discover how to use data preparation and data augmentation with your image datasets when developing and evaluating deep learning models in Python with Keras.
After reading this post, you will know:
Kick-start your project with my new book Deep Learning With Python, including step-by-step tutorials and the Python source code files for all examples.
Letβs get started.
Like the rest of Keras, the image augmentation API is simple and powerful.
Keras provides the ImageDataGenerator class that defines the configuration for image data preparation and augmentation. This includes capabilities such as:
An augmented image generator can be created as follows:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator()
Rather than performing the operations on your entire image dataset in memory, the API is designed to be iterated by the deep learning model fitting process, creating augmented image data for you just in time. This reduces your memory overhead but adds some additional time cost during model training.
After you have created and configured your ImageDataGenerator, you must fit it on your data. This will calculate any statistics required to actually perform the transforms to your image data. You can do this by calling the fit() function on the data generator and passing it to your training dataset.
datagen.fit(train)
The data generator itself is, in fact, an iterator, returning batches of image samples when requested. You can configure the batch size and prepare the data generator and get batches of images by calling the flow() function.
X_batch, y_batch = datagen.flow(train, train, batch_size=32)
Finally, you can make use of the data generator. Instead of calling the fit() function on your model, you must call the fit_generator() function and pass in the data generator and the desired length of an epoch as well as the total number of epochs on which to train.
fit_generator(datagen, samples_per_epoch=len(train), epochs=100)
You can learn more about the Keras image data generator API in the Keras documentation.
Now that you know how the image augmentation API in Keras works, letβs look at some examples.
We will use the MNIST handwritten digit recognition task in these examples. To begin with, letβs take a look at the first nine images in the training dataset.
# Plot images
from tensorflow.keras.datasets import mnist
import matplotlib.pyplot as plt
# load dbata
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_train[i*3+j], cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
Running this example provides the following image that you can use as a point of comparison with the image preparation and augmentation in the examples below.
Example MNIST images
It is also possible to standardize pixel values across the entire dataset. This is called feature standardization and mirrors the type of standardization often performed for each column in a tabular dataset.
You can perform feature standardization by setting the featurewise_center
and featurewise_std_normalization
arguments to True on the ImageDataGenerator
class. These are set to False by default. However, the recent version of Keras has a bug in the feature standardization so that the mean and standard deviation is calculated across all pixels. If you use the fit()
function from the ImageDataGenerator
class, you will see an image similar to the one above:
# Standardize images across the dataset, mean=0, stdev=1
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# fit parameters from data
datagen.fit(X_train)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
print(X_batch.min(), X_batch.mean(), X_batch.max())
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
For example, the minimum, mean, and maximum values from the batch printed above are:
-0.42407447 -0.04093817 2.8215446
And the image displayed is as follows:
Image from feature-wise standardization
The workaround is to compute the feature standardization manually. Each pixel should have a separate mean and standard deviation, and it should be computed across different samples but independent from other pixels in the same sample. You just need to replace the fit()
function with your own computation:
# Standardize images across the dataset, every pixel has mean=0, stdev=1
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# fit parameters from data
datagen.mean = X_train.mean(axis=0)
datagen.std = X_train.std(axis=0)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
print(X_batch.min(), X_batch.mean(), X_batch.max())
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j], cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
The minimum, mean, and maximum as printed now have a wider range:
-1.2742625 -0.028436039 17.46127
Running this example, you can see that the effect is different, seemingly darkening and lightening different digits.
Standardized feature MNIST images
A whitening transform of an image is a linear algebraic operation that reduces the redundancy in the matrix of pixel images.
Less redundancy in the image is intended to better highlight the structures and features in the image to the learning algorithm.
Typically, image whitening is performed using the Principal Component Analysis (PCA) technique. More recently, an alternative called ZCA (learn more in Appendix A of this tech report) shows better results in transformed images that keep all the original dimensions. And unlike PCA, the resulting transformed images still look like their originals. Precisely, whitening converts each image into a white noise vector, i.e., each element in the vector has zero mean and unit standard derivation and is statistically independent of each other.
You can perform a ZCA whitening transform by setting the zca_whitening
argument to True. But due to the same issue as feature standardization, you must first zero-center your input data separately:
# ZCA Whitening
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True, zca_whitening=True)
# fit parameters from data
X_mean = X_train.mean(axis=0)
datagen.fit(X_train - X_mean)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train - X_mean, y_train, batch_size=9, shuffle=False):
print(X_batch.min(), X_batch.mean(), X_batch.max())
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
Running the example, you can see the same general structure in the images and how the outline of each digit has been highlighted.
ZCA whitening MNIST images
Sometimes images in your sample data may have varying and different rotations in the scene.
You can train your model to better handle rotations of images by artificially and randomly rotating images from your dataset during training.
The example below creates random rotations of the MNIST digits up to 90 degrees by setting the rotation_range argument.
# Random Rotations
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(rotation_range=90)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
Running the example, you can see that images have been rotated left and right up to a limit of 90 degrees. This is not helpful on this problem because the MNIST digits have a normalized orientation, but this transform might be of help when learning from photographs where the objects may have different orientations.
Random rotations of MNIST images
Objects in your images may not be centered in the frame. They may be off-center in a variety of different ways.
You can train your deep learning network to expect and currently handle off-center objects by artificially creating shifted versions of your training data. Keras supports separate horizontal and vertical random shifting of training data by the width_shift_range
and height_shift_range
arguments.
# Random Shifts
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
shift = 0.2
datagen = ImageDataGenerator(width_shift_range=shift, height_shift_range=shift)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
Running this example creates shifted versions of the digits. Again, this is not required for MNIST as the handwritten digits are already centered, but you can see how this might be useful on more complex problem domains.
Random shifted MNIST images
Another augmentation to your image data that can improve performance on large and complex problems is to create random flips of images in your training data.
Keras supports random flipping along both the vertical and horizontal axes using the vertical_flip
and horizontal_flip
arguments.
# Random Flips
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False):
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
Running this example, you can see flipped digits. Flipping digits is not useful as they will always have the correct left and right orientation, but this may be useful for problems with photographs of objects in a scene that can have a varied orientation.
Randomly flipped MNIST images
The data preparation and augmentation are performed just in time by Keras.
This is efficient in terms of memory, but you may require the exact images used during training. For example, perhaps you would like to use them with a different software package later or only generate them once and use them on multiple different deep learning models or configurations.
Keras allows you to save the images generated during training. The directory, filename prefix, and image file type can be specified to the flow()
function before training. Then, during training, the generated images will be written to the file.
The example below demonstrates this and writes nine images to a βimages
β subdirectory with the prefix βaug
β and the file type of PNG.
# Save augmented images to file
from tensorflow.keras.datasets import mnist
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9, shuffle=False,
save_to_dir='images', save_prefix='aug', save_format='png'):
# create a grid of 3x3 images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(4,4))
for i in range(3):
for j in range(3):
ax[i][j].imshow(X_batch[i*3+j].reshape(28,28), cmap=plt.get_cmap("gray"))
# show the plot
plt.show()
break
Running the example, you can see that images are only written when they are generated.
Augmented MNIST images saved to file
Image data is unique in that you can review the data and transformed copies of the data and quickly get an idea of how the model may perceive it.
Below are some tips for getting the most from image data preparation and augmentation for deep learning.
Original article sourced at: https://machinelearningmastery.com
1670376611
In this TensorFlow tutorial, we will learn about using the Keras and tf.image preprocessing layer in TensorFlow in TensorFlow to enhance images.When you work on a machine learning problem related to images, not only do you need to collect some images as training data, but you also need to employ augmentation to create variations in the image. It is especially true for more complex object recognition problems.
There are many ways for image augmentation. You may use some external libraries or write your own functions for that. There are some modules in TensorFlow and Keras for augmentation too.
In this post, you will discover how you can use the Keras preprocessing layer as well as the tf.image
module in TensorFlow for image augmentation.
After reading this post, you will know:
tf.image
module for image augmentationtf.data
datasetLetβs get started.
This article is divided into five sections; they are:
Before you see how you can do augmentation, you need to get the images. Ultimately, you need the images to be represented as arrays, for example, in HxWx3 in 8-bit integers for the RGB pixel value. There are many ways to get the images. Some can be downloaded as a ZIP file. If youβre using TensorFlow, you may get some image datasets from the tensorflow_datasets
library.
In this tutorial, you will use the citrus leaves images, which is a small dataset of less than 100MB. It can be downloaded from tensorflow_datasets
as follows:
import tensorflow_datasets as tfds
ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True)
Running this code the first time will download the image dataset into your computer with the following output:
Downloading and preparing dataset 63.87 MiB (download: 63.87 MiB, generated: 37.89 MiB, total: 101.76 MiB) to ~/tensorflow_datasets/citrus_leaves/0.1.2...
Extraction completed...: 100%|ββββββββββββββββββββββββββββββ| 1/1 [00:06<00:00, 6.54s/ file]
Dl Size...: 100%|ββββββββββββββββββββββββββββββββββββββββββ| 63/63 [00:06<00:00, 9.63 MiB/s]
Dl Completed...: 100%|βββββββββββββββββββββββββββββββββββββββ| 1/1 [00:06<00:00, 6.54s/ url]
Dataset citrus_leaves downloaded and prepared to ~/tensorflow_datasets/citrus_leaves/0.1.2. Subsequent calls will reuse this data.
The function above returns the images as a tf.data
dataset object and the metadata. This is a classification dataset. You can print the training labels with the following:
...
for i in range(meta.features['label'].num_classes):
print(meta.features['label'].int2str(i))
This prints:
Black spot
canker
greening
healthy
If you run this code again at a later time, you will reuse the downloaded image. But the other way to load the downloaded images into a tf.data
dataset is to use the image_dataset_from_directory()
function.
As you can see from the screen output above, the dataset is downloaded into the directory ~/tensorflow_datasets
. If you look at the directory, you see the directory structure as follows:
.../Citrus/Leaves
βββ Black spot
βββ Melanose
βββ canker
βββ greening
βββ healthy
The directories are the labels, and the images are files stored under their corresponding directory. You can let the function to read the directory recursively into a dataset:
import tensorflow as tf
from tensorflow.keras.utils import image_dataset_from_directory
# set to fixed image size 256x256
PATH = ".../Citrus/Leaves"
ds = image_dataset_from_directory(PATH,
validation_split=0.2, subset="training",
image_size=(256,256), interpolation="bilinear",
crop_to_aspect_ratio=True,
seed=42, shuffle=True, batch_size=32)
You may want to set batch_size=None
if you do not want the dataset to be batched. Usually, you want the dataset to be batched for training a neural network model.
It is important to visualize the augmentation result, so you can verify the augmentation result is what we want it to be. You can use matplotlib for this.
In matplotlib, you have the imshow()
function to display an image. However, for the image to be displayed correctly, the image should be presented as an array of 8-bit unsigned integers (uint8).
Given that you have a dataset created using image_dataset_from_directory()
You can get the first batch (of 32 images) and display a few of them using imshow()
, as follows:
...
import matplotlib.pyplot as plt
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
for images, labels in ds.take(1):
for i in range(3):
for j in range(3):
ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
ax[i][j].set_title(ds.class_names[labels[i*3+j]])
plt.show()
Here, you see a display of nine images in a grid, labeled with their corresponding classification label, using ds.class_names
. The images should be converted to NumPy array in uint8 for display. This code displays an image like the following:
The complete code from loading the image to display is as follows:
from tensorflow.keras.utils import image_dataset_from_directory
import matplotlib.pyplot as plt
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves' # modify to your path
ds = image_dataset_from_directory(PATH,
validation_split=0.2, subset="training",
image_size=(256,256), interpolation="mitchellcubic",
crop_to_aspect_ratio=True,
seed=42, shuffle=True, batch_size=32)
# Take one batch from dataset and display the images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
for images, labels in ds.take(1):
for i in range(3):
for j in range(3):
ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
ax[i][j].set_title(ds.class_names[labels[i*3+j]])
plt.show()
Note that if youβre using tensorflow_datasets
to get the image, the samples are presented as a dictionary instead of a tuple of (image,label). You should change your code slightly to the following:
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt
# use tfds.load() or image_dataset_from_directory() to load images
ds, meta = tfds.load('citrus_leaves', with_info=True, split='train', shuffle_files=True)
ds = ds.batch(32)
# Take one batch from dataset and display the images
fig, ax = plt.subplots(3, 3, sharex=True, sharey=True, figsize=(5,5))
for sample in ds.take(1):
images, labels = sample["image"], sample["label"]
for i in range(3):
for j in range(3):
ax[i][j].imshow(images[i*3+j].numpy().astype("uint8"))
ax[i][j].set_title(meta.features['label'].int2str(labels[i*3+j]))
plt.show()
For the rest of this post, assume the dataset is created using image_dataset_from_directory()
. You may need to tweak the code slightly if your dataset is created differently.
Keras comes with many neural network layers, such as convolution layers, that you need to train. There are also layers with no parameters to train, such as flatten layers to convert an array like an image into a vector.
The preprocessing layers in Keras are specifically designed to use in the early stages of a neural network. You can use them for image preprocessing, such as to resize or rotate the image or adjust the brightness and contrast. While the preprocessing layers are supposed to be part of a larger neural network, you can also use them as functions. Below is how you can use the resizing layer as a function to transform some images and display them side-by-side with the original:
...
# create a resizing layer
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
# show original vs resized
fig, ax = plt.subplots(2, 3, figsize=(6,4))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# resize
ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("resize")
plt.show()
The images are in 256Γ256 pixels, and the resizing layer will make them into 256Γ128 pixels. The output of the above code is as follows:
Since the resizing layer is a function, you can chain them to the dataset itself. For example,
...
def augment(image, label):
return resize(image), label
resized_ds = ds.map(augment)
for image, label in resized_ds:
...
The dataset ds
has samples in the form of (image, label)
. Hence you created a function that takes in such tuple and preprocesses the image with the resizing layer. You then assigned this function as an argument for the map()
in the dataset. When you draw a sample from the new dataset created with the map()
function, the image will be a transformed one.
There are more preprocessing layers available. Some are demonstrated below.
As you saw above, you can resize the image. You can also randomly enlarge or shrink the height or width of an image. Similarly, you can zoom in or zoom out on an image. Below is an example of manipulating the image size in various ways for a maximum of 30% increase or decrease:
...
# Create preprocessing layers
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
height = tf.keras.layers.RandomHeight(0.3)
width = tf.keras.layers.RandomWidth(0.3)
zoom = tf.keras.layers.RandomZoom(0.3)
# Visualize images and augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# resize
ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("resize")
# height
ax[2][i].imshow(height(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("height")
# width
ax[3][i].imshow(width(images[i]).numpy().astype("uint8"))
ax[3][i].set_title("width")
# zoom
ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8"))
ax[4][i].set_title("zoom")
plt.show()
This code shows images as follows:
While you specified a fixed dimension in resize, you have a random amount of manipulation in other augmentations.
You can also do flipping, rotation, cropping, and geometric translation using preprocessing layers:
...
# Create preprocessing layers
flip = tf.keras.layers.RandomFlip("horizontal_and_vertical") # or "horizontal", "vertical"
rotate = tf.keras.layers.RandomRotation(0.2)
crop = tf.keras.layers.RandomCrop(out_height, out_width)
translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2)
# Visualize augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# flip
ax[1][i].imshow(flip(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("flip")
# crop
ax[2][i].imshow(crop(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("crop")
# translation
ax[3][i].imshow(translation(images[i]).numpy().astype("uint8"))
ax[3][i].set_title("translation")
# rotate
ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8"))
ax[4][i].set_title("rotate")
plt.show()
This code shows the following images:
And finally, you can do augmentations on color adjustments as well:
...
brightness = tf.keras.layers.RandomBrightness([-0.8,0.8])
contrast = tf.keras.layers.RandomContrast(0.2)
# Visualize augmentation
fig, ax = plt.subplots(3, 3, figsize=(6,7))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# brightness
ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("brightness")
# contrast
ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("contrast")
plt.show()
This shows the images as follows:
For completeness, below is the code to display the result of various augmentations:
from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves' # modify to your path
ds = image_dataset_from_directory(PATH,
validation_split=0.2, subset="training",
image_size=(256,256), interpolation="mitchellcubic",
crop_to_aspect_ratio=True,
seed=42, shuffle=True, batch_size=32)
# Create preprocessing layers
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
height = tf.keras.layers.RandomHeight(0.3)
width = tf.keras.layers.RandomWidth(0.3)
zoom = tf.keras.layers.RandomZoom(0.3)
flip = tf.keras.layers.RandomFlip("horizontal_and_vertical")
rotate = tf.keras.layers.RandomRotation(0.2)
crop = tf.keras.layers.RandomCrop(out_height, out_width)
translation = tf.keras.layers.RandomTranslation(height_factor=0.2, width_factor=0.2)
brightness = tf.keras.layers.RandomBrightness([-0.8,0.8])
contrast = tf.keras.layers.RandomContrast(0.2)
# Visualize images and augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# resize
ax[1][i].imshow(resize(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("resize")
# height
ax[2][i].imshow(height(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("height")
# width
ax[3][i].imshow(width(images[i]).numpy().astype("uint8"))
ax[3][i].set_title("width")
# zoom
ax[4][i].imshow(zoom(images[i]).numpy().astype("uint8"))
ax[4][i].set_title("zoom")
plt.show()
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# flip
ax[1][i].imshow(flip(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("flip")
# crop
ax[2][i].imshow(crop(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("crop")
# translation
ax[3][i].imshow(translation(images[i]).numpy().astype("uint8"))
ax[3][i].set_title("translation")
# rotate
ax[4][i].imshow(rotate(images[i]).numpy().astype("uint8"))
ax[4][i].set_title("rotate")
plt.show()
fig, ax = plt.subplots(3, 3, figsize=(6,7))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# brightness
ax[1][i].imshow(brightness(images[i]).numpy().astype("uint8"))
ax[1][i].set_title("brightness")
# contrast
ax[2][i].imshow(contrast(images[i]).numpy().astype("uint8"))
ax[2][i].set_title("contrast")
plt.show()
Finally, it is important to point out that most neural network models can work better if the input images are scaled. While we usually use an 8-bit unsigned integer for the pixel values in an image (e.g., for display using imshow()
as above), a neural network prefers the pixel values to be between 0 and 1 or between -1 and +1. This can be done with preprocessing layers too. Below is how you can update one of the examples above to add the scaling layer into the augmentation:
...
out_height, out_width = 128,256
resize = tf.keras.layers.Resizing(out_height, out_width)
rescale = tf.keras.layers.Rescaling(1/127.5, offset=-1) # rescale pixel values to [-1,1]
def augment(image, label):
return rescale(resize(image)), label
rescaled_resized_ds = ds.map(augment)
for image, label in rescaled_resized_ds:
...
Besides the preprocessing layer, the tf.image
module also provides some functions for augmentation. Unlike the preprocessing layer, these functions are intended to be used in a user-defined function and assigned to a dataset using map()
as we saw above.
The functions provided by the tf.image
are not duplicates of the preprocessing layers, although there is some overlap. Below is an example of using the tf.image
functions to resize and crop images:
...
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
# original
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# resize
h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8"))
ax[1][i].set_title("resize")
# crop
y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8")
ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8"))
ax[2][i].set_title("crop")
# central crop
x = tf.random.uniform([], minval=0.4, maxval=1.0)
ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8"))
ax[3][i].set_title("central crop")
# crop to (h,w) at random offset
h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8")
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8"))
ax[4][i].set_title("random crop")
plt.show()
Below is the output of the above code:
While the display of images matches what you might expect from the code, the use of tf.image
functions is quite different from that of the preprocessing layers. Every tf.image
function is different. Therefore, you can see the crop_to_bounding_box()
function takes pixel coordinates, but the central_crop()
function assumes a fraction ratio as the argument.
These functions are also different in the way randomness is handled. Some of these functions do not assume random behavior. Therefore, the random resize should have the exact output size generated using a random number generator separately before calling the resize function. Some other functions, such as stateless_random_crop()
, can do augmentation randomly, but a pair of random seeds in the int32
needs to be specified explicitly.
To continue the example, there are the functions for flipping an image and extracting the Sobel edges:
...
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# flip
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8"))
ax[1][i].set_title("flip left-right")
# flip
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8"))
ax[2][i].set_title("flip up-down")
# sobel edge
sobel = tf.image.sobel_edges(images[i:i+1])
ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8"))
ax[3][i].set_title("sobel y")
# sobel edge
ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8"))
ax[4][i].set_title("sobel x")
plt.show()
This shows the following:
And the following are the functions to manipulate the brightness, contrast, and colors:
...
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# brightness
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8"))
ax[1][i].set_title("brightness")
# contrast
ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
ax[2][i].set_title("contrast")
# saturation
ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
ax[3][i].set_title("saturation")
# hue
ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8"))
ax[4][i].set_title("hue")
plt.show()
This code shows the following:
Below is the complete code to display all of the above:
from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves' # modify to your path
ds = image_dataset_from_directory(PATH,
validation_split=0.2, subset="training",
image_size=(256,256), interpolation="mitchellcubic",
crop_to_aspect_ratio=True,
seed=42, shuffle=True, batch_size=32)
# Visualize tf.image augmentations
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
# original
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# resize
h = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
w = int(256 * tf.random.uniform([], minval=0.8, maxval=1.2))
ax[1][i].imshow(tf.image.resize(images[i], [h,w]).numpy().astype("uint8"))
ax[1][i].set_title("resize")
# crop
y, x, h, w = (128 * tf.random.uniform((4,))).numpy().astype("uint8")
ax[2][i].imshow(tf.image.crop_to_bounding_box(images[i], y, x, h, w).numpy().astype("uint8"))
ax[2][i].set_title("crop")
# central crop
x = tf.random.uniform([], minval=0.4, maxval=1.0)
ax[3][i].imshow(tf.image.central_crop(images[i], x).numpy().astype("uint8"))
ax[3][i].set_title("central crop")
# crop to (h,w) at random offset
h, w = (256 * tf.random.uniform((2,))).numpy().astype("uint8")
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[4][i].imshow(tf.image.stateless_random_crop(images[i], [h,w,3], seed).numpy().astype("uint8"))
ax[4][i].set_title("random crop")
plt.show()
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# flip
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[1][i].imshow(tf.image.stateless_random_flip_left_right(images[i], seed).numpy().astype("uint8"))
ax[1][i].set_title("flip left-right")
# flip
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[2][i].imshow(tf.image.stateless_random_flip_up_down(images[i], seed).numpy().astype("uint8"))
ax[2][i].set_title("flip up-down")
# sobel edge
sobel = tf.image.sobel_edges(images[i:i+1])
ax[3][i].imshow(sobel[0, ..., 0].numpy().astype("uint8"))
ax[3][i].set_title("sobel y")
# sobel edge
ax[4][i].imshow(sobel[0, ..., 1].numpy().astype("uint8"))
ax[4][i].set_title("sobel x")
plt.show()
fig, ax = plt.subplots(5, 3, figsize=(6,14))
for images, labels in ds.take(1):
for i in range(3):
ax[0][i].imshow(images[i].numpy().astype("uint8"))
ax[0][i].set_title("original")
# brightness
seed = tf.random.uniform((2,), minval=0, maxval=65536).numpy().astype("int32")
ax[1][i].imshow(tf.image.stateless_random_brightness(images[i], 0.3, seed).numpy().astype("uint8"))
ax[1][i].set_title("brightness")
# contrast
ax[2][i].imshow(tf.image.stateless_random_contrast(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
ax[2][i].set_title("contrast")
# saturation
ax[3][i].imshow(tf.image.stateless_random_saturation(images[i], 0.7, 1.3, seed).numpy().astype("uint8"))
ax[3][i].set_title("saturation")
# hue
ax[4][i].imshow(tf.image.stateless_random_hue(images[i], 0.3, seed).numpy().astype("uint8"))
ax[4][i].set_title("hue")
plt.show()
These augmentation functions should be enough for most uses. But if you have some specific ideas on augmentation, you would probably need a better image processing library. OpenCV and Pillow are common but powerful libraries that allow you to transform images better.
You used the Keras preprocessing layers as functions in the examples above. But they can also be used as layers in a neural network. It is trivial to use. Below is an example of how you can incorporate a preprocessing layer into a classification network and train it using a dataset:
from tensorflow.keras.utils import image_dataset_from_directory
import tensorflow as tf
import matplotlib.pyplot as plt
# use image_dataset_from_directory() to load images, with image size scaled to 256x256
PATH='.../Citrus/Leaves' # modify to your path
ds = image_dataset_from_directory(PATH,
validation_split=0.2, subset="training",
image_size=(256,256), interpolation="mitchellcubic",
crop_to_aspect_ratio=True,
seed=42, shuffle=True, batch_size=32)
AUTOTUNE = tf.data.AUTOTUNE
ds = ds.cache().prefetch(buffer_size=AUTOTUNE)
num_classes = 5
model = tf.keras.Sequential([
tf.keras.layers.RandomFlip("horizontal_and_vertical"),
tf.keras.layers.RandomRotation(0.2),
tf.keras.layers.Rescaling(1/127.0, offset=-1),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(32, 3, activation='relu'),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(num_classes)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model.fit(ds, epochs=3)
Running this code gives the following output:
Found 609 files belonging to 5 classes.
Using 488 files for training.
Epoch 1/3
16/16 [==============================] - 5s 253ms/step - loss: 1.4114 - accuracy: 0.4283
Epoch 2/3
16/16 [==============================] - 4s 259ms/step - loss: 0.8101 - accuracy: 0.6475
Epoch 3/3
16/16 [==============================] - 4s 267ms/step - loss: 0.7015 - accuracy: 0.7111
In the code above, you created the dataset with cache()
and prefetch()
. This is a performance technique to allow the dataset to prepare data asynchronously while the neural network is trained. This would be significant if the dataset has some other augmentation assigned using the map()
function.
You will see some improvement in accuracy if you remove the RandomFlip
and RandomRotation
layers because you make the problem easier. However, as you want the network to predict well on a wide variation of image quality and properties, using augmentation can help your resulting network become more powerful.
Original article sourced at: https://machinelearningmastery.com