Using microsoft azure services we will create the following pipeline:
My desire to speak other languages came when I was 18 and went to live in Accra, Ghana. I was a missionary for the Church of Jesus Christ of Latter-day Saints, called to serve the people I would meet there. In Ghana over 50 languages are recognized and where I lived (I moved four times) there were 3 or 4 main languages spoken. I wanted to connect with and serve these super great people, so I did what I could to learn their languages.
Almost 6 years now since I got back, I’m studying machine translation and looking to make my first attempt at making MT a thing for languages that I spoke like Twi. This article isn’t those efforts, but a starting point for people interested in machine translation.
This is not an in-depth look, or from scratch, article either. This is showing how to use some very powerful tools in a basic way, just to get you started. Most of this can be found in following along with Microsoft’s own quickstarts for the azure speech service or pyaudio’s documenation.
Honestly, I’m very impressed with Microsoft azure. In a machine translation course that I’m taking we found that for several languages (maybe more, but we haven’t tested them all) it was outpacing Google translate in accuracy (mainly tested with bleu scores). Plus the TTS allows you to improve the voice, which is a ton of fun!
Let’s get started with code and a basic understanding of what is happening.
Automatic speech recognition takes some pretty serious work. If you want to work with a little more exposed example you can try using pyaudio. They are awesome for experimenting with. Let me show you an example of that and then the Microsoft one and you can decide which you want to proceed with.
Machine translation, this is a massive field of research all on its own, but that is a discussion for another time in your life. What is so cool is that the training and fine tuning of a very powerful NMT is accessible to us, and it can help us start our MT journey. So, for now, rather than building an NMT from scratch, we will focus on using azure’s speech MT. This requires a ‘translation’ resource which you set up the same way you set up your ASR. A reminder to wait while it finishes setting up so that you can find it later (I made that mistake and took a long time wondering where it went only to have to make it again).
Text-to-speech is a very fun part of this, especially if you continue to option four. You may be able to use your original speech api key, but you’ll need to have a specific region region set to use neural voices. You can find your best region at that link and set it when creating this speech resource.
Now to put it all together, a simple file helps us easily call each service. Some parts have been commented out that you can play with to see both how pyaudio works and the changes other options make.
#translation #nmt #azure #machine-translation