Meta's New SeamlessM4T Multilingual Model Boasts Automatic Speech Recognition for Nearly 100 Languages

You’ve seen AudioCraft, now check out Meta’s new SeamlessM4T. This cutting edge multilingual and multitask model is capable of seamlessly translating as well as transcribing speech and text for nearly 100 languages. A text encoder, based on the NLLB model, has been trained to understand text in nearly 100 languages and produce representations that are useful for translation.

Meta uses acoustic units to represent speech on the target side, as the text-to-unit (T2U) component in the UnitY model generates discrete speech units based on the text output and is pre-trained on ASR data prior to UnitY fine-tuning. A multilingual HiFi-GAN unit vocoder can then be used to convert these discrete units into audio waveforms. Try it out here.

Meta Quest 2 — Advanced All-In-One Virtual Reality Headset — 128 GB

Experience total immersion with 3D positional audio, hand tracking and easy-to-use controllers working together to make virtual worlds feel real.
Explore an expanding universe of over 500 titles across gaming, fitness, social/multiplayer and entertainment, including exclusive releases and...
Enjoy fast, smooth gameplay and immersive graphics as high-speed action unfolds around you with a fast processor and immersive graphics.

We believe the work we’re announcing today is a significant step forward in this journey. Our single model provides on-demand translations that enable people who speak different languages to communicate more effectively. We significantly improve performance for the low and mid-resource languages we support,” said Meta.