Experimental Microsoft Kosmos-1 AI Model Can Interpret Images and Solve IQ Tests

ChatGPT-Bing just recently launched in beta, and now, an experimental Microsoft Kosmos-1 AI model is in the works. This Multimodal Large Language Model (MLLM) builds upon ChatGPT and adds the ability to interpret images as well as solve IQ tests with a 22% – 26% accuracy. In othe words, it can perceive general modalities, learn in context and follow instructions.

Microsoft researchers trained Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text / images, image-caption pairs, and text data. The current version requires researchers to translate the image into tokens (text) before Kosmos-1 can understand them. It was trained on data crawled from the web and paves the way for future AI models to perceive any type of media.

Microsoft Surface Go 3 - 10.5" Touchscreen - Intel® Pentium® Gold - 4GB Memory - 64GB eMMC - Device...

The most portable Surface touchscreen 2-in-1. Perfect for your everyday tasks, homework, and play.
Go-anywhere tablet with laptop productivity for the whole family. Starting at just 1.2 pounds,(4) with high-res 10.5” 220ppi touchscreen, adjustable...
All-day battery life. Get up to 11 hours (1) of on-the-go power, plus Fast Charging to take you from low to full when you do need to plug in.

We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs,” said the researchers.