Any deep learning model has two phases — training and inference. Both the phases are as important as the other. The training phase is an iterative process — we iterate to find optimal hyper-parameters, optimal neural network architecture, model refresh and the list goes one. This iterative process is compute intensive and time consuming.

On the other hand, the deployed model should serve millions of requests with low latency. Also, in real world scenario it’s a bunch of models, not one single model, act up on the user requests to produce the desired response. For instance, in case of a voice assistant the speech recognition, the natural language processing and the speech synthesis models work one after the other in sequence. Hence, it is very important that our deep learning pipeline optimally utilizes all the available compute resources to make both the phases efficient by all means.

Graphic Processing Units (GPU) are the most efficient compute resources for parallel processing. They are massively parallel with their thousands of CUDA cores and hundreds of Tensor cores. It is up to the user to best utilize the available GPU resource to make the pipeline efficient. This article discusses four tools from NVIDIA toolkit that can seamlessly integrate in to your deep learning pipeline making it more efficient.

  1. DAta loading LIbrary — DALI
  2. Purpose built pre-trained models
  3. TensorRT
  4. Triton Inference Server

#nvidia #dali #deep-learning #transfer-learning

Accelerate your Deep Learning Pipeline with NVIDIA Toolkit
1.35 GEEK