Support for Android and iOS, Named Tensor, TPU Support, Quantization and more.
Facebook just released PyTorch v1.3 and it is packed with some of the most awaited features. The three most attractive ones are:
I would be talking in brief about all of these and will link to some other important features.
PyTorch v1.3 finally added the support for named tensors which allows users to access tensor dimensions using explicitly associated names rather than remembering the dimension number. For example, up until now in computer vision related tasks, we had to remember the general structure of a batch as follows — [N, C, H, W]. Where N is batch size, C is number of channels, H and W are height and width of images respectively. We had to keep track of such a structure while performing operations on this batch but now we can just use the dimension name rather keeping track of its index. Additionally, these named representations can provide augmented runtime error checks. I will discuss them further in this article.
import torch
batch = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
print(batch.names)
batch.names
prints the name of each dimension on tensor batch
in order.
Use [align_as()](https://pytorch.org/docs/master/named_tensor.html#torch.Tensor.align_as)
or [align_to()](https://pytorch.org/docs/master/named_tensor.html#torch.Tensor.align_to)
to align tensor dimensions by name to a specified ordering.
In computer vision models often the representation of a batch needs to change in between [N, C, H, W] (for forward and backward pass of model) and [N, H, W, C] (for plotting and saving images). Until now this had to be done counter intuitively as batch.permute([0, 2, 3, 1])
but now it can be done in a much easier way by using the [align_as()](https://pytorch.org/docs/master/named_tensor.html#torch.Tensor.align_as)
or [align_to()](https://pytorch.org/docs/master/named_tensor.html#torch.Tensor.align_to)
operator.
import torch
batch = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
print(batch.shape) #torch.Size([64, 3, 100, 100])
batch = batch.align_to('N', 'H', 'W', 'C')
print(batch.shape) #torch.Size([64, 100, 100, 3])
For a large number of dimensions, normal permute operator needs an explicit list of all dimensions even for exchanging the positions of just two dimensions. However, in the case of named tensors, the permutation or reordering of the dimensions can be done in an easier and simpler way as follows:
#####################
# Before PyTorch v1.3
#####################
import torch
batch = torch.zeros(2, 3, 2, 2, 2, 2, 2, 2, 2, 2)
print(batch.shape)
batch = batch.permute([0, 2, 1, 3, 4, 5, 6, 7, 8, 9])
print(batch.shape)
#####################
# After PyTorch v1.3
#####################
import torch
alphas = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
batch = torch.zeros(2, 3, 2, 2, 2, 2, 2, 2, 2, 2, names=alphas)
print(batch.shape)
batch = batch.align_to('a', 'c', 'b', ...)
print(batch.shape)
Apart from making tensors more intuitive, named tensors also provide additional error checks. An operator, when applied on named tensors (in case of binary operator any one or both named tensors) will implicitly check that certain dimension names match at the run time. This provides extra security against errors. This is shown in the example below:
import torch
# The following will produce no error as the dimensions match
batch1= torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
batch2 = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
batch3 = batch1 + batch2
# The following will produce an error as the dimensions don't match
batch1= torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
batch2 = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'W', 'H'))
batch3 = batch1 + batch2
In the above example, if not for named tensors, batch1
and batch2
can be added without any error because height = width = 100. However, batch2
has the height and width dimension interchanged and adding it to batch1
might not be the intended operation. Thanks to named tensors, this logical error will be caught by name checking as (’N’, ‘C’, ‘H’, ‘W’) and (’N’, ‘C’, ‘W’, ‘H’) are not the same.
When are the names matched?
The rules are pretty much like the broadcasting rules for the dimensions in numpy or PyTorch. Quoting the official PyTorch docs:
Two names match if they are equal (string equality) or if at least one is
_None_
. Nones are essentially a special “wildcard” name.”
This is showcased in the example below:
import torch
batch1= torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
batch2 = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'H', 'W'))
batch3 = torch.zeros(64, 3, 100, 100)
batch4 = torch.zeros(64, 3, 100, 100)
batch5 = torch.zeros(64, 3, 100, 100, names=('N', 'C', 'W', 'H'))
#Name tuple strings are equal so name matches
res1 = batch1 + batch2
#Name of one of the inputs (batch3) is None so matches
res2 = batch1 + batch3
#Name of both the inputs are None so matches
res2 = batch3 + batch4
#Name tuple strings are not equal so no name match and error
res3 = batch1 + batch5
After performing operations on tensors you don’t need to enter the dimension names again, they will automatically be propagated. PyTorch makes use of two operators — match
and unify
for name propagation.
match
is the same operator as defined above, it checks whether the two named tensors can be matched or not.unify
is an operator used to determine which of the two input tensor’s name shall be propagated the resulting tensor. Quoting the official PyTorch docs:
_unify(A, B)_
determines which of the names_A_
and_B_
to propagate to the outputs. It returns the more specific of the two names, if they match. If the names do not match, then it errors.
Name propagation is showcased in an example below:
import torch
# Unary operator on named tensors
t = torch.randn(4, 2, names=('N', 'C'))
t = t.abs()
t.names #output: ('N', 'C')
import torch
# Binary operator on named tensors
t1 = torch.randn(4, names=('X'))
t2 = torch.randn(4)
t3 = t1 * t2
t3.names #output: ('X',)
At the time of writing this article, the named tensor functionality is in experimental mode and maybe subjected to many changes. However, one of the biggest current limitations of named tensors is that they don’t fully support the Autograd engine. Although the gradient computations for named tensors is perfectly same but the autograd engine completely ignores the name and along with it, the additional security they provide.
PyTorch 1.3 now supports Quantization of tensors. This is something Tensorflow already supports and was much awaited in PyTorch. Quantization is a fairly simple yet elaborate concept. I will try to write about them in brief with a high level of abstraction here by answering the three questions — What, Why and How.
What is Quantization?
Quantization is the technique of performing operations in a low-precision format or converting high precision data formats to low precision data formats. This is done for example by treating a 32-bit floating point format as an 8-bit fixed point format. If interested, you might read about the fixed-point and floating-point arithmetic and their related complexities to better understand the need of Quantization.
Why Quantization?
The whole purpose of pursuing research and creating neural network models is to deploy them and make them available for the public good. While the model training need only grows proportionally to the number of researchers and machine learning practitioners, the need for mode inference grows proportional to the number of consumers. To allow more and better access to the end users, the representation of the models deployed for inference needs to be much more compact than their representation while training. Another thing to keep in mind is that backpropagation needs high precision representations for the model weights and biases. However, during inference, the models are much more robust and do not require high precision representation. Thus, a model of size 113.9MB in 32 bit floating point representation can be quantized as int8 to be of size 76.8MB.
How to use Quantization in PyTorch v1.3?
Quoting the official PyTorch documentation:
PyTorch supports INT8 quantization compared to typical FP32 models allowing for a 4x reduction in the model size and a 4x reduction in memory bandwidth requirements. Hardware support for INT8 computations is typically 2 to 4 times faster compared to FP32 compute.
For Quantization, PyTorch introduced three new data types as follows:
torch.quint8
— 8-bit unsigned integer.torch.qint8
— 8-bit signed integer.torch.qint32
— 32-bit signed integer.PyTorch now offers three kinds of quantization methods for models:
torch.quantization.quantize_dynamic(model, dtype=torch.qint8)
Another thing to notice is that PyTorch support Quantization from ground up. This means, we can also quantize tensors with the following equation (very intuitive) and code:
import torch
t = torch.tensor([1.111111111])
t_q = torch.quantize_per_tensor(t, 0.1, 10, torch.quint8)
#output: tensor([21], dtype=torch.uint8)
print(t_q.int_repr())
#output: tensor([1.1000], size=(1,), dtype=torch.quint8, quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=10)
print(t_q)
Here is the list of all supported operations on quantized tensors.
The natural purpose of allowing quantization is shipping PyTorch models on performance critical mobile phones (and other devices). They have implemented an end-to-end PyTorch API for both Android and iOS. This will have a great impact on reducing inference latency and user privacy. However, PyTorch mobile is currently in an early experimental stage where it has several limitations. For example, the current version only supports forward propagation (inference) and no backward operations are supported.
You can access the hello world tutorials for both iOS and Android on the official PyTorch website.
Apart from these three major updates, PyTorch v1.3 has implemented several other changes and bug fixes. You can view the list of all the changes on the official PyTorch Github repository. Some of these features are:
torch.nn
, torch.nn.functional
, Autograd engine and more.Additionally, if you want to port your code from a previous version of PyTorch to PyTorch v1.3, you need to take of care features which might cause errors or unintended behavior. Some of these features are (I am directly quoting these changes from the previously mentioned release notes):
torch.tensor(5) + 1.5
outputs a tensor with value 6.5. In earlier versions, the output would’ve been 6.a = torch.tensor(0); a.add_(0.5)
in earlier versions, this would’ve given a tensor with value 1 as the output. But as of now, PyTorch no more support in-place operations with a lower datatype hence the above piece of code outputs an error.torch.flatten(torch.tensor(0))
is tensor([0])
compared to tensor(0)
earlier. Rather than returning a 0D tensor, it now returns a 1D tensor.nn.functional.affine_grid
: when align_corners = True, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size). Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).torch.lstsq
instead.utils.data.DataLoader
made a number of Iterator attributes private (e.g. num_workers, pin_memory).#python #tensorflow #machine-learning #deep-learning