ChatGPT now interprets photos better than an art critic and an investigator combined

ChatGPT’s recent image generation capabilities have challenged our previous understing of AI-generated media. The recently announced GPT-4o model demonstrates noteworthy abilities of interpreting images with high accuracy and recreating them with viral effects, such as that inspired by Studio Ghibli. It even masters text in AI-generated images, which has previously been difficult for AI. And now, it is launching two new models capable of dissecting images for cues to gather far more information that might even fail a human glance.

OpenAI announced two new models earlier this week that take ChatGPT’s thinking abilities up a notch. Its new o3 model, which OpenAI calls its “most powerful reasoning model” improves on the existing interpretation and perception abilities, getting better at “coding, math, science, visual perception, and more,” the organization claims. Meanwhile, the o4-mini is a smaller and faster model for “cost-efficient reasoning” in the same avenues. The news follows OpenAI’s recent launch of the GPT-4.1 class of models, which brings faster processing and deeper context.

Recommended Videos

ChatGPT is now “thinking with images”

With improvements to their abilities to reason, both models can now incorporate images in their reasoning process, which makes them capable of “thinking with images,” OpenAI proclaims. With this change, both models can integrate images in their chain of thought. Going beyond basic analysis of images, the o3 and o4-mini models can investigate images more closely and even manipulate them through actions such as cropping, zooming, flipping, or enriching details to fetch any visual cues from the images that could potentially improve ChatGPT’s ability to provide solutions.

Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date.For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE

— OpenAI (@OpenAI) April 16, 2025

With the announcement, it is said that the models blend visual and textual reasoning, which can be integrated with other ChatGPT features such as web search, data analysis, and code generation, and is expected to become the basis for a more advanced AI agents with multimodal analysis.

Related

  • You can now view all of your ChatGPT-generated images in one place

  • OpenAI might start watermarking ChatGPT images — but only for free users

  • ChatGPT Plus is free for a limited time: Here’s how to check if you qualify

Among other practical applications, you can expect to include pictures of a multitude of items, such flow charts or scribble from handwritten notes to images of real-world objects, and expect ChatGPT to have a deeper understanding for a better output, even without a descriptive text prompt. With this, OpenAI is inching closer to Google’s Gemini, which offers the impressive ability to interpret the real world through live video.

Despite bold claims, OpenAI is limiting access only to paid members, presumably to prevent its GPUs from “melting” again, as it struggles to keep up the compute demand for new reasoning features. As of now, the o3, o4-mini, and o4-mini-high models will be exclusively available to ChatGPT Plus, Pro, and Team members while Enterprise and Education tier users get it in one week’s time. Meanwhile, Free users will be able to limited access to o4-mini when they select the “Think” button in the prompt bar.

Editors’ Recommendations

  • Fun things to ask ChatGPT now that it remembers everything

  • ChatGPT can now remember more details from your past conversations

  • OpenAI adjusts AI roadmap for better GPT-5

  • OpenAI plans to make Deep Research free on ChatGPT, in response to competition

  • Viral trend drives ChatGPT to 500 million users




Related posts

Latest posts

Get a $400 discount if you buy this Alienware Aurora gaming PC with RTX 5080 today

The Alienware Aurora R16 gaming PC with the Nvidia GeForce RTX 5080 graphics card and 32GB of RAM is on sale from Dell with a $400 discount.

This Acer Chromebook is on sale for a very affordable $149

The Acer Chromebook 315 is equipped with the Intel Celeron N4500 processor and 4GB of RAM, but it's still pretty quick, and it's on sale from Best Buy for just $149.

Gemini fuels Chrome for Android with enhanced scam and spam protection

Google detailed an update for Chrome on Android that advances its notification protections with Gemini.

Max says more ‘assertive’ measures against password sharing arrive soon

Max stated it will begin cracking down more prominently on sharing passwords later in 2025 and into 2026.

Google calls the US Department of Justice’s proposed antitrust remedies ‘extreme’

The U.S. wants Google to stop making deals to lift up its search engine, and wants to force a sale

Gemini Live is finally arriving on enterprise Google accounts for work and school

Gemini Live has been in the hands of Android users for months, but users with Google Workspace accounts couldn't access

There’s no doubt — Fiio’s FT1 Pro are the best planar headphones under $300

With the FT1 Pro, Fiio once again delivers headphones that have a good design backed by stellar sound.

Six iPhone Safety Tools You Should Know About

Apple is known for its privacy policies that keep user data collected to a minimum, but the company has also

The world’s smallest SSD is almost impossibly tiny

A Japanese company has announced the world's smallest SSD.

Samsung’s first tri-fold handset could be a game-changer for company

Samsung is reportedly gearing up to use a next-generation silicon-carbon battery in its first-ever tri-fold smartphone, anticipated to launch later this year. This new battery technology boasts a higher energy density than traditional lithium-ion batteries, which can pack more power into a smaller space. According to leaker PandaFlash (via Android Headlines) while the battery capacity […]