NVIDIA NeMo — Building Custom Speech Recognition Model

NVIDIA NeMo is a Conversational AI toolkit. The toolkit is an accelerator, which helps researchers and practitioners to experiments with complex neural network architectures. Speech processing (recognition and synthesis) and Natural Language Processing are the significant capabilities of the platform. As it comes from the NVIDIA, full support to GPU is available. The framework relays on PyTorch as the Deep Learning framework.

In this notebook, we will try how to create an Automatic Speech Recognition (ASR). In this tutorial, we will use the LibriSpeech dataset.

Setup

For this experiment the following software: Ubuntu 16.04 Anaconda 4.7.11 NeMo — https://github.com/NVIDIA/NeMo Kaladi — https://github.com/kaldi-asr/kaldi Follow the instructions from the software readme to run the code. Make sure that you have PyTorch installed with GPU support. Hardware Specification Minimum six GiG of GPU RAM is required.

#machine-learning #deep-learning #data-science #nvidia #speech-recognition

Setup

towardsdatascience.com

NVIDIA NeMo — Building Custom Speech Recognition Model