Don’t you hate the picture breaking up when making video calls? This issue could be caused by the heavy bandwidth demands of the video conferencing app, but even with a powerful desktop computer or a low-end phone or tablet, it all comes down to how fast your internet connection is. “One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing” by NVIDIA researchers aim to solve this issue with AI-based video compression technology. Read more for a video and additional information.
This method would dramatically reduce bandwidth requirements by sending only a keypoint representation of faces and reconstructing the source video on the receiver side with the help of generative adversarial networks (GANs), or in other words, generating fake talking heads. This proposed system extracts appearance features and 3D canonical keypoints from the source image, which are then used to compute source keypoints as well as generate keypoints for the synthesis videos. It can also synthesize associated accessories in the source video, including eyeglasses, hats, and even scarves.
- 15.6" FHD IPS-Level 144Hz 72%NTSC Thin Bezel close to 100%Srgb NVIDIA GeForce RTX 2070 8G GDDR6
- Intel Core i7-10750H 2.6-5.0GHz Intel Wi-Fi 6 AX201(2 x 2 ax)
- 512GB NVMe SSD 16GB (8G*2) DDR4 2666MHz 2 Sockets Max Memory 64GB
- USB 3.1 Gen2 Type C 1 USB 3.2 Gen1 3 Steel Series per-Key RGB with Anti-Ghost key+ silver lining 720p HD Webcam
- Win10 Multi-language Giant Speakers 3W x 2 6 cell (51Wh) Li-Ion 230W
Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth,” according to the research paper.