ChatGPT’s latest model may be a regression in performance

According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,” OpenAI wrote on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

Recommended Videos

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the Artificial Analysis announced via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):➤… pic.twitter.com/gjY2pBFuUv

— Artificial Analysis (@ArtificialAnlys) November 21, 2024

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.

Editors’ Recommendations

  • ChatGPT just improved its creative writing chops

  • This open-source alternative to ChatGPT just got serious

  • Is AI already plateauing? New reporting suggests GPT-5 may be in trouble

  • ChatGPT: the latest news and updates on the AI chatbot that changed everything

  • OpenAI’s robotics plans aim to ‘bring AI into the physical world’




Related posts

Latest posts

Upgrading to a new phone? Android can now keep your apps signed in after restoring from a backup

Google finally figured out a way to save your app login status when switching to a new device.

Samsung Galaxy S25 Ultra vs. Google Pixel 9 Pro XL: To wait or not to wait

The Galaxy S25 Ultra will be a powehouse of a phone, but that will come at a steep cost. Is

Android 16 could go ‘even dimmer’ in a more convenient way for users

Google was spotted reworking its "Extra Dim" feature in Android 16 Developer Preview 1.

It’s already Black Friday for my favorite Meta Quest 3 and 3S accessories

Kiwi Design has long been one of our favorite Meta Quest accessory companies, and their best Quest 3 and Quest

Fitbit may eventually become a pre-installed app on Android phones

The launch of the Oppo Find X8 series hints that the Fitbit app might become a core part of the

Apple Working on ‘LLM Siri’ for 2026 Launch

Apple is working on a smarter version of Siri that employees have taken to calling "LLM ‌Siri‌," reports Bloomberg. Apple

This Asus laptop with Copilot+ is $350 off at Best Buy

From its gorgeous OLED screen to its powerful CPU and graphics capabilities, the Asus Vivobook is hard to say no to, especially when it’s marked down to $550.

The ESR Black Friday deals offer discounts on iPad and AirPod cases

These ESR Black Friday deals offer deals on iPad and AirPod cases, something out of the ordinary for mobile accessories. Come take a look.

Lenovo Chromebook Duet 11 review: a step backward

The Lenovo Chromebook Duet 11 is very affordable, but sometimes that's just not good enough. Performance and the keyboard are subpar, which is a shame.

YouTube Music’s Now Playing controls are moved around in latest test

YouTube is testing a fresh look for YouTube Music playback controls, and they're inspired by the main YouTube app.