OpenAI on Monday announced GPT-4o, a brand new AI model that that the company says is one step closer to “much more natural human-computer interaction.” The new model accepts any combination of text, audio and images as input and can generate an output in all three formats. It’s also capable of recognizing emotion, lets you interrupt it mid-speech, and responds nearly as fast as a human being during conversations.

“The special thing about GPT-4o is it beings GPT-4 level intelligence to everyone, including our free users,” said OpenAI CTO Mira Murati during a live-streamed presentation. “This is the first time we’re making a huge step forward when it comes to ease of use.”

During the presentation, OpenAI showed off GPT-4o translating live between English and Italian, helping a researcher solve a linear equation in real time on paper, and providing guidance on deep breathing to another OpenAI executive simply by listening to his breaths.

The “o” in GPT-4o stands for “omni”, a reference to the model’s multimodal capabilities. OpenAI said that GPT-4o was trained across text, vision and audio, which means that all inputs and outputs are processed by the same neural network. This is different from the company’s previous models, GPT-3.5 and GPT-4, which did let users ask questions simply by speaking, but then transcribing the speech into text. This stripped out tone and emotion and made interactions slower.

OpenAI is making the new model available to everyone, including free ChatGPT users, over the next few weeks and also releasing a desktop version of ChatGPT, initially for the Mac, which paid users will have access to starting today.

OpenAI’s announcement comes a day before Google I/O, the company’s annual developer conference. Shortly after OpenAI revealed GPT-4o, Google teased a version of Gemini, its own AI chatbot, with similar capabilties.

Source: www.engadget.com