Keras Out of memory with small batch size

I built an autoencoder using just tensorflow libraries with a network shape of:

Layer (type) Output Shape Param #

input_1 (InputLayer) (None, 168, 120, 3) 0

flatten_1 (Flatten) (None, 60480) 0

dense_1 (Dense) (None, 1024) 61932544

dense_2 (Dense) (None, 256) 262400

dense_3 (Dense) (None, 1024) 263168

dense_4 (Dense) (None, 60480) 61992000

reshape_1 (Reshape) (None, 168, 120, 3) 0

Total params: 124,450,112
Trainable params: 124,450,112
Non-trainable params: 0

In the project just using tensorflow, I was able to train using my GPUs with no problem, with a batch size of 128. I wanted to recreate the autoencoder with just keras, and I run into an out of memory exception even with a batch size of one. From researching the problem, I've found that the best solution to this problem is to reduce the batch size, but I can't reduce it any further. My machine has 2 GTX 970 cards running in SLI (CUDA doesn't care about SLI) for a total of 8GB of memory. Why would I not be able to train this network with keras, even though I was able to train the same one with 64x the batch size using tensorflow?

Here is the relevant code:


# Constants

WIDTH = 120
HEIGHT = 168


INPUT_PATH = './input/'
VALIDATION_PATH = './validation/'
MODEL_PATH = './model/'

MODEL_FILE = 'my_model.h5'
EPOCH_FILE = 'initial_epoch.txt'

Initialization and save:

# this is our input placeholder
input_img = Input(shape=(constants.HEIGHT,constants.WIDTH,constants.CHANNELS))

flatten image into one dimension

flatten = Flatten()(input_img)

hidden layer 1

hidden = Dense(constants.HIDDEN_WIDTH, activation='relu')(flatten)

"encoded" is the encoded representation of the input

encoded = Dense(constants.ENCODING_WIDTH, activation='relu')(hidden)

hidden layer 3

hidden = Dense(constants.HIDDEN_WIDTH, activation='relu')(encoded)

"decoded" is the lossy reconstruction of the input

decoded = Dense(constants.NUM_INPUTS, activation='relu')(hidden)

reshape to image dimensions

reshape = Reshape((constants.HEIGHT,constants.WIDTH,constants.CHANNELS))(decoded)

this model maps an input to its reconstruction

autoencoder = Model(input_img, reshape)


autoencoder.compile(optimizer='adam', loss='mean_squared_error')

train_datagen = ImageDataGenerator(data_format='channels_last',

test_datagen = ImageDataGenerator(data_format='channels_last',

train_generator = train_datagen.flow_from_directory(

validation_generator = test_datagen.flow_from_directory(


Creates a HDF5 file 'my_model.h5'
with open(constants.MODEL_PATH+constants.EPOCH_FILE, 'w') as f:

print("Done, model created in: " + constants.MODEL_PATH)

Part of the error log:

2019-01-29 16:40:10.522222: W tensorflow/core/common_runtime/] ***********************************************************************************************_____
2019-01-29 16:40:10.525191: W tensorflow/core/framework/] OP_REQUIRES failed at : Resource exhausted: OOM when allocating tensor with shape[60480,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "", line 53, in <module>
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\legacy\", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\", line 1418, in fit_generator
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\", line 217, in fit_generator
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\engine\", line 1217, in train_on_batch
outputs = self.train_function(ins)
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\", line 2715, in call
return self._call(inputs)
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\keras\backend\", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\", line 1439, in call
File "C:\Users\dekke\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\", line 528, in exit
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,60480] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node training/Adam/gradients/dense_4/MatMul_grad/MatMul_1}} = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense_4/MatMul_grad/MatMul"], transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_3/Relu, training/Adam/gradients/dense_4/Relu_grad/ReluGrad)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

