What Google Translate does is nothing short of amazing. In order to engineer the ability to translate between any pair within the dozens of languages it supports, Google Translate’s creators utilized some of the most advanced and recent developments in NLP in exceptionally creative ways.


In machine translation, there are generally two approaches: a rule-based approach and a machine learning-based approach. Rule-based translation involves the collection of a massive dictionary of translations, perhaps word-by-word or by phrase, which are pieced together into a translation.

For one, grammar structures differ significantly between languages. Consider Spanish, in which objects have a masculine or feminine gender. All adjectives and words like ‘the’ or ‘a’ must conform to the gender of the object in which it is describing. Translating ‘the big red apples’ into Spanish would require each of the words ‘the’, ‘big’, and ‘red’ to be written in both plural and feminine form, since those are the attributes of the word ‘apples’. In addition, in Spanish adjectives usually follow the noun, but sometimes they go before.

Image for post

The result is ‘las [the] grandes [big] manzanas [apples] rojas [red]’. This grammar and the necessity of changing all adjectives doesn’t make any sense to a pure English speaker. Just within English-to-Spanish translation, there are too many disparities in fundamental structure to keep track of. However, a truly global translation requires translation between every pair of languages.

Within this task arises another problem: to translate between, say, French and Mandarin, the only feasible rule-based solution would be to translate French into a base language — probably English — which would be then translated into Mandarin. This is like playing telephone: the nuance of a phrase said in one language is trampled over by noise and heavy-handed generalization.

Image for post

#deep-learning #data-science #ai #data analysis

Breaking Down the Innovative Deep Learning Behind Google Translate
1.15 GEEK