Facebook AI engineers Sean Vasquez and Mike Lewis have discovered a way to take robotic sounding text-to speech systems to the next level, producing lifelike audio clips generated entirely by machine. Called MelNet, this AI-powered system reproduces human intonation and can do so using the same voice as real people, like Bill Gates. Think of this as deepfakes, but for audio instead.
Vasquez and Lewis don’t use audio waveforms, but rather spectrograms to train their deep-learning network. Why? Spectrograms are capable of recording the entire spectrum of audio frequencies and how they change over time. For comparison, waveforms capture the change over time of one parameter, amplitude, while spectrograms capture the change over a huge range of different frequencies.
- Throw down with the Xbox Series S – Gilded Hunter Bundle featuring nine in-game cosmetics and virtual currency for Fortnite, Rocket League, and Fall...
- Go all digital with Xbox Series S and experience next-gen speed and performance at a great price.
- Gilded Hunter Pack for Fortnite: become the feared, fanged, and ruthless Hunter Saber Outfit, wield the weapon of legends upgraded for modern combat...
“A cramp is no small danger on a swim.”
“He said the same phrase thirty times.”
“Pluck the bright rose without leaves.”
“Two plus seven is less than ten.”
“Having trained the system using ordinary speech from TED talks, MelNet is then able to reproduce the TED speaker’s voice saying more or less anything over a few seconds. The Facebook researchers demonstrate its flexibility using Bill Gates’s TED talk to train MelNet and then use his voice to say a range of random phrases,” reports Technology Review.