OpenAI is leading the artificial intelligence world and has become one of the biggest competitors by introducing the GPT-4o.
OpenAI is announcing GPT-4o, a new flagship model that can reason across audio, vision and text in real time. This multimodal model is the company’s fastest.
Microsoft is thrilled to announce the launch of the GPT-4o setting a new standard for generative and conversational AI experiences. GPT-4o provides a richer and more engaging user experience.
GPT-4o is one step forward to a more innate human-computer experience and interaction. OpenAI is leading the artificial intelligence world and has become one of the biggest competitors by introducing the GPT-4o. This new age model will attract more users to its platform. The new model is the updated version of the large language model technology that powers ChatGPT. OpenAI’s Chief Executive, Sam Altman tweeted that the company has been working hard on some new things that they believe people might love.
What is GPT-4o?
The “o” in GPT-4o stands for the word “omni” which means “all” or “everything” as the company claims that their new model has something for everyone.
Achieving a more natural human-computer interaction, the model accepts any combination of texts, audio, image, and video as input and generates any combination of text, audio, and image as output. It is OpenAI’s fastest model.
GPT-4o is better at vision and audio understanding as compared to existing models.
How fast is GPT-4o?
GPT-4o can respond to audio inputs similar to human response time in a conversation in as little as 232 milliseconds, with an average of 320 milliseconds. It includes an improved feature on text in non-English languages and is 50% cheaper. While matching the GPT-4 turbo performance it is much faster than the previous models.
Tokens are the basic unit in AI that calculates the length of the text input and is able to include punctuation marks, characters and spaces. GPT-4o requires the use of fewer tokens in languages. The token count also varies from one language to another. For example, in Arabic (from 53 to 26), Gujarati (from 145 to 33), Chinese (from 34 to 24), Spanish (from 29 to 26), English from 27 to 24).
A research revealed that a response in 100 milliseconds is perceived as an instant response, while a second or a little less than a second is fast enough making the users feel they are interacting. On the other hand, a response time of 10 seconds would make the user lose interest and attention.
How does GPT-4o work?
A voice mode was used to talk to ChatGPT at latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average in previous AI Models of OpenAI. The voice mode acts as a pipeline of three separate models to achieve this. The first one is a simple model that transcribes audio into text, the second model is the GPT-3.5 or GPT-4 that inputs and outputs text, and the third model is again a simple model that converts the text back to audio. It is a simple process of converting input into output.
The previous model GPT-4 is the main source of intelligence but cannot directly observe tone, multiple speakers, background noises, and loses a lot of information. It cannot produce laughter, singing, or express any kind of emotion as output either.
Therefore, the new version GPT-4o has been rectified and OpenAI has merged all these functions into one single model for better relatability of the users. GPT-4o comes with end-to-end capabilities across text, vision and audio, reducing the amount of time consumed and information processed.
GPT-4o archives and excels GPT-4 turbo level performance on text, reasoning, and coding intelligence. It sets new high watermarks on multilingual, audio, and vision capabilities as measured on traditional benchmarks.
Safety and limitations of GPT-4o
GPT-4o has a built-in safety design in all modalities like data filtering and model’s behaviour post-training. A new safety system is created to guardrail the outputs. Evaluation in all categories show medium risk levels. These assessments involved automated and human evaluations throughout training, testing both pre and post mitigation models.
The new model GPT-4o has undergone extensive processes to identify new or increased risks with various experts in social psychology, bias and misinformation. These learnings are brought out to improve the safety of interacting with GPT-4o and new risks will be mitigated as they are discovered.
How much does GPT-4o cost?
In August, OpenAI launched their ChatGPT Enterprise monthly plan, the pricing of the plan varied as per the user requirement. In January it launched its online ChatGPT store which gave the users access to 3 million custom versions of GPT’s.
But on the brighter side, this new model GPT-4o is free for all users and their paid users can use up to five times the capacity limits of their free peers, said the Chief Technology Officer of OpenAI, Mira Murati.