NTT's SpeakerBeam Uses AI to Selectively Tune Out Background Noise and Target a Single Speaker

Unlike the Whistle Speaker, NTT’s SpeakerBeam uses AI-powered deep learning algorithms to tune out background noise and single target an individual speaker using just snippet of their voice. That’s right, it only needs a recording lasting around 10 seconds of the target speaker’s voice, regardless of their location in the room.

Its dual neural network architecture then accepts the mixed speech input and processes it to output the voice of only target speaker. The most important part of this architecture is an adaptive layer, which adjusts its parameters based on the target speaker’s voice traits provided by the auxiliary network. The auxiliary network simultaneously processes the adaptation utterance to determine the distinct characteristics of the target speaker’s voice.

Sony SRSXB13/B Extra Bass Portable Waterproof Speaker with Bluetooth, USB Type-C, 16 Hours Battery Life

EXTRA BASS for deep, punchy sound
Sound Diffusion Processor expands sound far and wide
Waterproof and dustproof (IP67 rated).

SpeakerBeam has huge potential. Its ability to focus on a target speaker irrespective of their position or the number of background noises makes it a promising tool for multi-party conversation recognition, smart speakers, voice recorders, and even hearing aids. Nevertheless, work still remains before it can be perfected,” said Daniel O’Connor, Public Relations Manager of NTT Europe.