OpenAI Introduces GPT-4o, a New Flagship AI Model That Can Reason Through Audio, Vision and Text

OpenAI introduces GPT-4o, the company’s latest flagship model capable of reasoning across audio, vision, and text, all in real time. This model can accept as input any combination of text, audio, and image, before using that data to generate any combination of text, audio, as well as image outputs.

Unlike its predecessors, GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, or similar to human response time during a conversation. It can achieve this minimal latency because OpenAI trained a single new model end-to-end across text, vision, and audio, which means that all inputs / outputs are processed by the same neural network.

Sale

Google Pixel 7a - Unlocked Android Cell Phone - Smartphone with Wide Angle Lens and 24-Hour Battery - 128...

Fast Charging: Pixel 7a has a 24-hour battery life and supports fast charging up to 72 hours with Extreme Battery Saver turned on.
Advanced Camera: Pixel 7a has a 6.1-inch OLED display with 1080p video capture and a 64MP rear camera that performs well in low light.
Security Features: Pixel 7a has a 6.1-inch OLED display with 1080p video capture and a 64MP rear camera that performs well in low light. It also has a...

It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models,” said OpenAI.

Related Posts

Assassin’s Creed Shadows Officially Announced, Here’s the First Teaser

Technics x Lamborghini SL-1200M7B Might be the Sleekest Direct Drive Turntable Yet