TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

TVM is an open source deep learning compiler stack to compile various deep learning models from different frameworks to the CPU, GPU or specialised accelerators. TVM also supports runtime bindings for programming languages like Python, Java,etc . We will discuss strengths and shortcoming of TVM paper with brief summary.

Summary :

TVM provides optimization at different levels. When the model is imported, first level optimization happens in the graph. This optimization provides graph level fusion, layout transformation and memory management. Later optimization happens at the tensor level — at the code-generation layer. The TVM stack has many layers; the User interface layer which is written in python and supports input modules of various frameworks is the framework layer. Tensorflow, caffe models are converted to TVM compatible graphs in this layer. At the computation graph optimisation layer, the graph representation is optimised by various passes like precompute prune which prunes the graph nodes that can be computed at compilation time, the layout conversion pass which adds the necessary layout conversion operations (or nodes) across layers if there is a layout mismatch between layers, and a fusion pass which joins the computation of multiple nodes into one based on certain rules. A novel learning based cost model method automates optimization of low level programs to hardware characteristics for rapid code optimizations. The next layer in the stack is the schedule space and optimisations for low level and hardware specific optimizations. A tensor expression language is introduced to build operators and provide program changes primitives that generate various optimized programs. TVM has been implemented to be optimized on embedded CPU, GPU, Embedded GPU, FPGA hardware to produce state-of-the- art results with hardware specific optimizations. TVM supports multiple hardware optimization, initial graph level optimization like pruning, fusion are more or less the same for every hardware but for low level hardware specific optimization differs hardware to hardware.

Motivation:

In today’s world Deep learning models can now recognize images, process natural language. Current frameworks, such as TensorFlow,PyTorch rely on a computational graph intermediate representation to implement optimizations, e.g., auto differentiation and dynamic memory management. Drawback of primitive DL framework is that they support a small number of GPU devices and transfer target-specific optimizations to highly engineered and vendor-specific operator libraries which require significant manual tuning and are not able to be ported across multiple backends due to opaqueness and specialization. There needs to be a system to provide both graph and low level hardware specific operator level optimizations for diverse hardware back-ends. Tensorflow and Pytorch lacked this optimization, which gave rise to graph compiler based optimization like TVM which support many hardware back and require no manual tuning from the Data Science developer side and he/she can focus on algorithm point of view.

#deep-learning-compiler #deep-neural-networks #mtv #deep learning

medium.com

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning