Keras Out of memory with small batch size

Keras Out of memory with small batch size

I built an autoencoder using just tensorflow libraries with a network shape of:

I built an autoencoder using just tensorflow libraries with a network shape of:

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 168, 120, 3)       0
_________________________________________________________________
flatten_1 (Flatten)          (None, 60480)             0
_________________________________________________________________
dense_1 (Dense)              (None, 1024)              61932544
_________________________________________________________________
dense_2 (Dense)              (None, 256)               262400
_________________________________________________________________
dense_3 (Dense)              (None, 1024)              263168
_________________________________________________________________
dense_4 (Dense)              (None, 60480)             61992000
_________________________________________________________________
reshape_1 (Reshape)          (None, 168, 120, 3)       0
=================================================================
Total params: 124,450,112
Trainable params: 124,450,112
Non-trainable params: 0
_________________________________________________________________

In the project just using tensorflow, I was able to train using my GPUs with no problem, with a batch size of 128. I wanted to recreate the autoencoder with just keras, and I run into an out of memory exception even with a batch size of one. From researching the problem, I've found that the best solution to this problem is to reduce the batch size, but I can't reduce it any further. My machine has 2 GTX 970 cards running in SLI (CUDA doesn't care about SLI) for a total of 8GB of memory. Why would I not be able to train this network with keras, even though I was able to train the same one with 64x the batch size using tensorflow?

Here is the relevant code:

Constants:

# Constants

WIDTH = 120 HEIGHT = 168 CHANNELS = 3 NUM_INPUTS = WIDTHHEIGHTCHANNELS BATCH_SIZE=1 NUM_SAMPLES=5000 VALIDATION_SIZE=1 VALIDATION_SAMPLES=100 EPOCHS=1000

HIDDEN_WIDTH = 1024 ENCODING_WIDTH = 256

INPUT_PATH = './input/' VALIDATION_PATH = './validation/' MODEL_PATH = './model/'

MODEL_FILE = 'my_model.h5' EPOCH_FILE = 'initial_epoch.txt'

Initialization and save:

# this is our input placeholder
input_img = Input(shape=(constants.HEIGHT,constants.WIDTH,constants.CHANNELS))

flatten image into one dimension

flatten = Flatten()(input_img)

hidden layer 1

hidden = Dense(constants.HIDDEN_WIDTH, activation='relu')(flatten)

"encoded" is the encoded representation of the input

encoded = Dense(constants.ENCODING_WIDTH, activation='relu')(hidden)

hidden layer 3

hidden = Dense(constants.HIDDEN_WIDTH, activation='relu')(encoded)

"decoded" is the lossy reconstruction of the input

decoded = Dense(constants.NUM_INPUTS, activation='relu')(hidden)

reshape to image dimensions

reshape = Reshape((constants.HEIGHT,constants.WIDTH,constants.CHANNELS))(decoded)

this model maps an input to its reconstruction

autoencoder = Model(input_img, reshape)

autoencoder.summary()

autoencoder.compile(optimizer='adam', loss='mean_squared_error')

train_datagen = ImageDataGenerator(data_format='channels_last', rescale=1./255)

test_datagen = ImageDataGenerator(data_format='channels_last', rescale=1./255)

train_generator = train_datagen.flow_from_directory( constants.INPUT_PATH, target_size=(constants.HEIGHT,constants.WIDTH), color_mode='rgb', class_mode='input', batch_size=constants.BATCH_SIZE)

validation_generator = test_datagen.flow_from_directory( constants.VALIDATION_PATH, target_size=(constants.HEIGHT,constants.WIDTH), color_mode='rgb', class_mode='input', batch_size=constants.VALIDATION_SIZE)

autoencoder.fit_generator(train_generator, steps_per_epoch=constants.NUM_SAMPLES1.0/constants.BATCH_SIZE, epochs=1, verbose=2, validation_data=validation_generator, validation_steps=constants.VALIDATION_SAMPLES1.0/constants.VALIDATION_SIZE)

Creates a HDF5 file 'my_model.h5'

autoencoder.save(constants.MODEL_PATH+constants.MODEL_FILE) with open(constants.MODEL_PATH+constants.EPOCH_FILE, 'w') as f: f.write(str(1))

print("Done, model created in: " + constants.MODEL_PATH)

Part of the error log:

2019-01-29 16:40:10.522222: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ***___
2019-01-29 16:40:10.525191: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at matmul_op.cc:478 : Resource exhausted: OOM when allocating tensor with shape[60480,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "init.py", line 53, in <module>
    validation_steps=constants.VALIDATION_SAMPLES1.0/constants.VALIDATION_SIZE)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(args, **kwargs)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2715, in __call
    return self.call(inputs)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\tensorflow_backend.py", line 2675, in call
    fetched = self._callable_fn(*array_vals)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1439, in __call
    run_metadata_ptr)
  File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 528, in exit
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,60480] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node training/Adam/gradients/dense_4/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense_4/MatMul_grad/MatMul"], transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_3/Relu, training/Adam/gradients/dense_4/Relu_grad/ReluGrad)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


python tensorflow

What's new in Bootstrap 5 and when Bootstrap 5 release date?

How to Build Progressive Web Apps (PWA) using Angular 9

What is new features in Javascript ES2020 ECMAScript 2020

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Random Password Generator Online

HTML Color Picker online | HEX Color Picker | RGB Color Picker

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

Top Python Development Companies | Hire Python Developers

After analyzing clients and market requirements, TopDevelopers has come up with the list of the best Python service providers. These top-rated Python developers are widely appreciated for their professionalism in handling diverse projects. When...

Python Hello World Program - Create & Run Your First Python Program in PyCharm

Python Hello World Program - Your first step towards Python world. Learn how to create the Hello World Python program in PyCharm.

Standard Data Types in Python - Python Tutorial - Python Training

This video on 'Standard Data Types in Python' will help you establish a foothold on Python by helping you learn basic concepts.

Python Libraries You Must Learn in 2020-Learn Python-Python Training

This LIVE session on 'Python Libraries' will help you understand the topmost trending Python libraries toy must learn.