ChatGPT-Bing just recently launched in beta, and now, an experimental Microsoft Kosmos-1 AI model is in the works. This Multimodal Large Language Model (MLLM) builds upon ChatGPT and adds the ability to interpret images as well as solve IQ tests with a 22% – 26% accuracy. In othe words, it can perceive general modalities, learn in context and follow instructions.
Microsoft researchers trained Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text / images, image-caption pairs, and text data. The current version requires researchers to translate the image into tokens (text) before Kosmos-1 can understand them. It was trained on data crawled from the web and paves the way for future AI models to perceive any type of media.
- The most portable Surface touchscreen 2-in-1. Perfect for your everyday tasks, homework, and play.
- Go-anywhere tablet with laptop productivity for the whole family. Starting at just 1.2 pounds,(4) with high-res 10.5” 220ppi touchscreen, adjustable...
- All-day battery life. Get up to 11 hours (1) of on-the-go power, plus Fast Charging to take you from low to full when you do need to plug in.
We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs,” said the researchers.