Towards an ImageNet Moment for Speech-to-Text: A Deep Dive

Towards an ImageNet Moment for Speech-to-Text: A Deep Dive

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade.

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade. Currently, it is often believed that only large corporations like Google, Facebook, or Baidu (or local state-backed monopolies for the Russian language) can provide deployable “in-the-wild” solutions.

Original TOC

Abstract

Speech-to-text (STT), also known as automated-speech-recognition (ASR), has a long history and has made amazing progress over the past decade. Currently, it is often believed that only large corporations like Google, Facebook, or Baidu (or local state-backed monopolies for the Russian language) can provide deployable “in-the-wild” solutions. This is due to several reasons:

  1. High compute requirements that are usually used in papers erect artificially high entry barriers;
  2. Speech requiring significant data due to the diverse vocabulary, speakers, and compression artifacts;
  3. A mentality where practical solutions are abandoned in favor of impractical, yet state of the art (SOTA) solutions.

In this piece we describe our effort to alleviate these concerns, both globally and for the Russian language, by:

  1. Introducing the diverse 20,000 hour Open STT dataset published under CC-NC-BY license;
  2. Demonstrating that it is possible to achieve competitive results using only TWO consumer-grade and widely available GPUs;
  3. Offering a plethora of design patterns that democratize entry to the speech domain for a wide range of researchers and practitioners.

deep learning

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Top 10 Deep Learning Sessions To Look Forward To At DVDC 2020

Looking to attend an AI event or two this year? Below ... Here are the top 22 machine learning conferences in 2020: ... Start Date: June 10th, 2020 ... Join more than 400 other data-heads in 2020 and propel your career forward. ... They feature 30+ data science sessions crafted to bring specialists in different ...

Learn Transfer Learning for Deep Learning by implementing the project.

Project walk-through on Convolution neural networks using transfer learning. From 2 years of my master’s degree, I found that the best way to learn concepts is by doing the projects.

Deep Reinforcement Learning for Video Games Made Easy

Deep Q-Networks have revolutionized the field of Deep Reinforcement Learning, but the technical prerequisites for easy experimentation have barred newcomers until now.

Deep learning on graphs: successes, challenges, and next steps

Deep learning on graphs: successes, challenges, and next steps. TL;DR This is the first in a series of posts where I will discuss the evolution and future trends in the field of deep learning on graphs.

Emojify - Create your own emoji with Deep Learning

Emojify - Create your own emoji with Deep Learning. We will classify human facial expressions to filter and map corresponding emojis or avatars.