Keras vs Pytorch for Deep Learning

For many Scientists, Engineers, and Developers, TensorFlow was their first Deep Learning framework. TensorFlow 1.0 was released back February 2017; to say the least, it wasn’t very user friendly.

Over the past couple of years, two major Deep Learning libraries have gained massive popularity, mainly due to how much easier to use they are over TensorFlow: Keras and Pytorch.

The article will cover a list of 4 different aspects of Keras vs. Pytorch and why you might pick one library over the other.

Keras

Keras is not a framework on it’s own, but actually a high-level API that sits on top of other Deep Learning frameworks. Currently it supports TensorFlow, Theano, and CNTK.

The beauty of Keras lies in its easy of use. It’s by far the easiest framework to get up and running fast. Defining neural networks is intuitive, where using the functional API allows one to define layers as functions.

Pytorch

Pytorch is a Deep Learning framework (like TensorFlow) developed by Facebook’s AI research group. Like Keras, it also abstracts away much of the messy parts of programming deep networks.

In terms of high vs low level coding style, Pytorch lies somewhere in between Keras and TensorFlow. You have more flexibility and control than Keras, but at the same time you’re not having to do any crazy declarative programming.

Deep Learning practitioners wrestle back and forth all day about which framework one should use. Generally, it’s up to personal preference. But there are a few aspects of Keras and Pytorch that you should keep in mind when making your pick.

Keras vs Pytorch for Deep Learning

(1) Classes vs. Functions for defining models

To define Deep Learning models, Keras offers the Functional API. With the Functional API, neural networks are defined as a set of sequential functions, applied one after the other. For example, the output of the function defining layer 1 is the input of the function defining layer 2.

img_input = layers.Input(shape=input_shape)
x = layers.Conv2D(64, (3, 3), activation='relu')(img_input)    
x = layers.Conv2D(64, (3, 3), activation='relu')(x)    
x = layers.MaxPooling2D((2, 2), strides=(2, 2))(x)

In Pytorch, you set up your network as a class which extends the torch.nn.Module from the Torch library. Similar to Keras, Pytorch provides you layers as building blocks, but since they’re in a Python class they are reference in the class’s init() method and executed by the class’s forward() method.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, 3)
        self.conv2 = nn.Conv2d(64, 64, 3)
        self.pool = nn.MaxPool2d(2, 2)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))
        return x
model = Net()

Because Pytorch gives you access to all of Python’s class features as opposed to simple function calls, defining the networks can be a lot clearer and more elegantly contained. There’s really not much downside to this, unless you really feel that writing your network code as quickly as possible is most important to you, then Keras will be a bit easier to work with.

(2) Tensors and Computational Graphs vs standard arrays

The Keras API hides a lot of the messy details from the casual coder. Defining the network layers is intuitive and the default settings are often enough to get you started.

The only time you really have to get down to low-level, nitty-gritty TensorFlow is when you’re implementing a fairly cutting-edge or “exotic” model.

The tricky part is that when you do actually go down to the lower-level TensorFlow code, you get all the challenging parts that come along with it! You’ll need to make sure that all of your matrix multiplications line up. Oh and don’t even think about trying to print out one of the outputs of your layers, as you’ll just get a nice Tensor definition printed out on your terminal.

Pytorch tends to be a little more forgiving in these aspects. You are required to know the input and output sizes of each of the layers, but this is one of the easier aspects which one can get the hang of quite quickly. You don’t have to deal with building an abstract computational graph which you can’t see inside of for debugging.

Another plus for Pytorch is the smoothness in which you can go back and forth between Torch Tensors and Numpy arrays. If you need to implement something custom, then going back and forth between TF tensors and Numpy arrays can be a pain, requiring the developer to have a solid understanding of TensorFlow sessions.

Pytorch interop is actually much simpler. There’s just two operations you need to know: one to switch a Torch Tensor (a Variable object) to Numpy and another one to go in the opposite direction.

Of course, if you never have to implement anything fancy, then Keras will do just fine as you won’t run into any TensorFlow road blocks. But if you do, then Pytorch will probably be a lot smoother of a ride.

(3) Training models

Keras vs Pytorch for Deep Learning

Training a model in Keras is super easy! Just a simple .fit() and you can kick your feet up and enjoy the ride!

history = model.fit_generator(
    generator=train_generator,
    epochs=10,
    validation_data=validation_generator)

Training a model in Pytorch consists of a few steps:

Initialise gradients at the start of each batch of training
Run the forward pass through the mode;
Run the backward pass
Compute the loss and update the weights

for epoch in range(2):  # loop over the dataset multiple times
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        # (1) Initialise gradients
        optimizer.zero_grad()
        # (2) Forward pass
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        # (3) Backward
        loss.backward()
        # (4) Compute the loss and update the weights
        optimizer.step()

That’s a lot of steps just to run the training!

I suppose this way you’re always aware of what’s happening. At the same time, it’s quite unnecessary since these model training steps essentially remain unchanged for training different models.

(4) Controlling CPU vs GPU mode

Keras vs Pytorch for Deep Learning

If you have tensorflow-gpu installed, then using the GPU is enabled and done by default in Keras. Then, if you wish to move certain operations to CPU, you can do so with a one-liner.

with tf.device('/cpu:0'):
    y = apply_non_max_suppression(x)

For Pytorch, you’ll have to enable the GPU explicitly for every torch tensor and numpy variable. This clutters up the code and can be a bit error prone if you move back and forth between CPU and GPU for different operation.

For example, to transfer our previous model to run on GPU we have to do the following:

# Get the GPU device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Transfer the network to GPU
net.to(device)
# Transfer the inputs and labels to GPU
inputs, labels = data[0].to(device), data[1].to(device)

Keras wins here on its simplicity and nice default setup

General advice for choosing a framework

The advice I usually give is to start with Keras.

Keras is definitely the easiest framework to use, understand, and quickly get up and running with. You don’t have to worry about GPU setup, fiddling with abstract code, or in general doing anything complicated. You can even do things like implementing custom layers and loss functions without ever touching a single line of TensorFlow.

If you do start to get down to the more fine-grained aspects of deep networks or are implementing something that’s non-standard, then Pytorch is your go-to library. It’ll be a bit of extra work over Keras, but not so much so that it slows you down. You’ll still be able to rapidly implement, train, and test your networks, with the added bonus of easy debugging!

#deep-learning #machine-learning #data-science #artificial-intelligence