Gemini 3 Flash is smart — but when it doesn’t know, it makes stuff up anyway

  • Gemini 3 Flash often invents answers instead of admitting when it doesn’t know something
  • The problem arises with factual or high‑stakes questions
  • But it still tests as the most accurate and capable AI model

Gemini 3 Flash is fast and clever. But if you ask it something it doesn’t actually know – something obscure or tricky or just outside its training – it will almost always try to bluff its way through, according to a recent evaluation from the independent testing group Artificial Analysis.

It seems Gemini 3 Flash hit 91% on the “hallucination rate” portion of the AA-Omniscience benchmark. That means when it didn’t have the answer, it still gave one anyway, almost all the time, one that was entirely fictional.

AI chatbots making things up has been an issue since they first debuted. Knowing when to stop and say I don’t know is just as important as knowing how to answer in the first place. Currently, Google Gemini 3 Flash AI doesn’t do that very well. That’s what the test is for: seeing whether a model can differentiate actual knowledge from a guess.

Lest the number distract from reality, it should be noted that Gemini’s high hallucination rate doesn’t mean 91% of its total answers are false. Instead, it means that in situations where the correct answer would be “I don’t know,” it fabricated an answer 91% of the time. That’s a subtle but important distinction, but one that has real-world implications, especially as Gemini is integrated into more products like Google Search.

This result doesn’t diminish the power and utility of Gemini 3. The model remains the highest-performing in general-purpose tests and ranks alongside, or even ahead of, the latest versions of ChatGPT and Claude. It just errs on the side of confidence when it should be modest.

The overconfidence in answering crops up with Gemini’s rivals as well. What makes Gemini’s number stand out is how often it happens in these uncertainty scenarios, where there’s simply no correct answer in the training data or no definitive public source to point to.

Hallucination Honesty

Part of the issue is simply that generative AI models are largely word-prediction tools, and predicting a new word is not the same as evaluating truth. And that means the default behavior is to come up with a new word, even when saying “I don’t know” would be more honest.

OpenAI has started addressing this and getting its models to recognize what they don’t know and say so clearly. It’s a tough thing to train, because reward models don’t typically value a blank response over a confident (but wrong) one. Still, OpenAI has made it a goal for the development of future models.

And Gemini does usually cite sources when it can. But even then, it doesn’t always pause when it should. That wouldn’t matter much if Gemini were just a research model, but as Gemini becomes the voice behind many Google features, being confidently wrong could affect quite a lot.

There’s also a design choice here. Many users expect their AI assistant to respond quickly and smoothly. Saying “I’m not sure” or “Let me check on that” might feel clunky in a chatbot context. But it’s probably better than being misled. Generative AI still isn’t always reliable, but double-checking any AI response is always a good idea.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Read more @ TechRadar

Latest posts

We translated the Palantir manifesto for actual human beings

Palantir CEO Alex Karp is a man in charge of one of the most important and frightening companies in the world. Karp's new book,...

SpaceX cuts a deal to maybe buy Cursor for $60 billion

With an IPO looming for Elon Musk's SpaceX / xAI / X combo platter of companies, SpaceX has announced an odd arrangement to either...

YouTube is muting push notifications from channels you don’t watch

YouTube notifications can get messy fast, particularly if you’re subscribed to a lot of different channels. To address that, today the company will begin...

Cash App now supports accounts for kids 6-12

Cash App, the banking and payments app run by Block, has added support for parent-managed kids accounts. The new accounts include key benefits from...

Mozilla says it patched 271 Firefox vulnerabilities thanks to Anthropic’s Claude Mythos

Anthropic's buzzy announcement about using AI to improve cybersecurity earlier this month was met with plenty of skepticism. However, Mozilla shared some details that...

SpaceX and Cursor strike partnership that might end in a $60 billion acquisition

SpaceX and AI company Cursor have struck a new partnership that could see the owner of X buy the AI company for $60 billion...

Google Wallet adds Live Update for flight tracking

As previously teased, Google Wallet for Android now offers Live Updates for tracking your current flight. Read more @ 9to5google

The AirPods are Tim Cook’s most underrated achievement

The AirPods changed the direction of true wireless earbuds and became Apple’s most important accessory. | Photography by Amelia Holowaty Krales / The Verge Apple...

Framework is building a better couch keyboard because everyone hates the Logitech one

If you have a wireless keyboard with a touchpad that lets you control your PC from across the room, chances are it's a Logitech...

Framework’s first eGPUs turn its laptop into a desktop PC

Remember when Framework made the first laptop where you can easily upgrade its entire internal video card in three minutes flat? The company's getting...