Google Translate breaks your speech down into text-based words, performs a text-to-text translation, and then plays back the translated text using TTS (text-to-speech synthesis). Google’s all-new Translatotron relies on a direct speech-to-speech translation model. In other words, it directly translates input taken as speech and plays it back using a single attentive sequence-to-sequence model, thus offering “faster inference speed, naturally avoiding compounding errors between recognition and translation, making it straightforward to retain the voice of the original speaker after translation, and better handling of words that do not need to be translated,” said Google. Read more for a video and additional information.
“Though Google is now in possession of a new translation model, it still isn’t ready to incorporate it into Google Translate and other related tools. The new system is falling behind on BLEU score, meaning the translations aren’t accurate enough yet. On the plus side, the new model retains the user’s natural voice even after translation as it doesn’t use TTS for output,” reports Digit.in.