Recently, researchers at DeepMind proposed EATS, an end-to-end adversarial text-to-speech generative model for TTS trained adversarially. EATS operate on either pure text or raw i.e. temporally unaligned phoneme input sequences and produce raw speech waveforms as output.

Researches on text-to-speech systems have shown impressive growth over a few years. Artificial speech synthesis, commonly known as text-to-speech (TTS) includes a number of applications in domains like technology interfaces, accessibility, entertainment, among others.

DeepMind Introduces EATS – An End-to-End Adversarial Text-To-Speech
