Google VideoPoet Multimodal LLM Zero-Shot Video Generation
Google VideoPoet is the company’s latest multimodal large language model (LLM) capable of accepting several inputs – text, images, videos and audio – for zero-shot video generation. Simply put, this LLM combines multiple video generation capabilities into a unified language model.



Technically speaking, VideoPoet is classified as an autoregressive model, or one that creates output by taking cues from what it previously generated. This LLM has been trained on video, audio, image, and text with tokenizers to convert the input to build different modalities. Tokenization refers to the process of converting input text into smaller units also known as tokens, critical for Natural Language Processing as it enables AI to understand and analyze human language.

Sale
Google Pixel 8 - Unlocked Android Smartphone with Advanced Pixel Camera, 24-Hour Battery, and Powerful...
  • Pixel 8 is the helpful phone engineered by Google; the new Google Tensor G3 chip is custom-designed with Google AI for cutting-edge photo and video...
  • Unlocked Android 5G phone gives you the flexibility to change carriers and choose your own data plan[2]; it works with Google Fi, Verizon, T-Mobile,...
  • Google Pixel 8 has a fully upgraded camera with advanced image processing to reveal vivid colors and striking details; and now with Macro Focus, even...

This simple recipe shows that language models can synthesize and edit videos with a high degree of temporal consistency. VideoPoet demonstrates state-of-the-art video generation, in particular in producing a wide range of large, interesting, and high-fidelity motions,” said Google.

[Source]

Author

A technology, gadget and video game enthusiast that loves covering the latest industry news. Favorite trade show? Mobile World Congress in Barcelona.