MIT's AI-Powered Speech2Face Technology Can Use Your Voice to Predict What You Look Like

H/t: Peta Pixel
MIT’s Speech2Face technology is capable of reconstructing a facial image of a person using just a short audio recording of them speaking. This is made possible by an AI-powered deep neural network that utilizes millions of natural videos of people speaking from the internet. They trained the model by helping it learn audiovisual, voice-face correlations that allow Speech2Face to produce images that capture various physical attributes of the speakers such as age, gender and ethnicity.

Researchers did not have to monitor Speech2Face during training, as it was completed in a self-supervised manner by utilizing the natural co-occurrence of faces and speech in videos, without the need to model attributes explicitly. The reconstructions were all obtained directly from audio to reveal the correlations between faces and voices. This allowed researchers to evaluate and numerically quantify how Speech2Face reconstructions from audio resemble the true face images of the persons.

Sale

619,379 Reviews

Apple AirPods (2nd Generation) Wireless Ear Buds, Bluetooth Headphones with Lightning Charging Case...

HIGH-QUALITY SOUND — Powered by the Apple H1 headphone chip, AirPods (2nd generation) deliver rich, vivid sound.
EFFORTLESS SETUP — After a simple one-tap setup, AirPods are automatically on and always connected. They sense when they’re in your ears and pause...
VOICE CONTROL WITH SIRI — Just say “Hey Siri” for assistance without having to reach for your iPhone.

Our model is designed to reveal statistical correlations that exist between facial features and voices of speakers in the training data. The training data we use is a collection of educational videos from YouTube, and does not represent equally the entire world population. Therefore, the model—as is the case with any machine learning model—is affected by this uneven distribution of data,” said the researchers.