The piece provide you with a glimpse on the fundamental concepts behind NVIDIA NeMo. It is an extremely powerful tookit when it comes to building your own state of the art models for conversational AI. For your information, a typical conversational AI pipeline consists of the following domains:

  1. Automated Speech Recognition (ASR)
  2. Natural Language Processing (NLP)
  3. Text to Speech (TTS)

If you are finding for a full-fledged toolkit to train or fine-tune model for these domains, you might want to have a look at NeMo. It allows researchers and model developers to build their own neural network architectures using reusable components called Neural Modules (NeMo). Based on the official documentation, neural modules are

“… conceptual blocks of neural networks that take typed inputs and produce typed outputs. Such modules typically represent data layers, encoders, decoders, language models, loss functions, or methods of combining activations.”

One major plus point for NeMo is that it can be used to train new model or perform transfer learning on existing pre-trained models. On top of that, there are quite a number of pre-trained models available for your usage at NVIDIA GPU Cloud (NGC). At the time of this writing, the GPU-accelerated cloud platform has the following pre-trained models:

ASR

#python #speech-recognition #nlp #text-to-speech #machine-learning

Beginner’s Guide to NVIDIA NeMo
9.00 GEEK