OpenAI beats Google, Meta, and Grok in all-AI poker tournament

  • OpenAI’s o3 model won a five-day poker tournament of nine AI chatbots
  • The o3 model won by playing the most consistent game
  • Most top language models handled poker well, but struggled with bluffing, position, and basic math

In a digital showdown unlike anything ever dealt at the felt, nine of the world’s most powerful large language models spent five days locked in a high-stakes poker match.

OpenAI’s o3, Anthropic’s Claude Sonnet 4.5, X.ai’s Grok, Google’s Gemini 2.5 Pro, Meta’s Llama 4, DeepSeek R1, Kimi K2 from Moonshot AI, Magistral from Mistral AI, and Z.AI’s GLM 4.6 played thousands of hands of no-limit Texas hold ’em at $10 and $20 tables with $100,000 bankrolls apiece.

When OpenAI’s o3 model walked away from a weeklong poker game $36,691 richer, there was no trophy, just bragging rights.

The experimental PokerBattle.ai was entirely AI-run with the same initial prompt issued to each player. It was pure strategy, if strategy is what you call thousands of micro-decisions made by machines that don’t really understand winning, losing, or how humiliating it is to bust with seven-deuce.

For a tech stunt, it was unusually telling. The top-performing AIs weren’t just bluffing and betting – they were adapting, modeling their opponents, and learning in real time how to navigate ambiguity. While they didn’t play flawless poker, they came impressively close to mimicking seasoned players’ judgment calls.

OpenAI’s o3 quickly showed it had the steadiest hand, taking down three of the five biggest pots and sticking close to textbook pre-flop theory. Anthropic’s Claude and X.com’s Grok rounded out the top three with substantial profits of $33,641 and $28,796, respectively.

Meanwhile, Llama lost its full stack and flamed out early. The rest of the pack landed somewhere in between, with Google’s Gemini turning a modest profit and Moonshot’s Kimi K2 hemorrhaging chips down to an $86,030 finish.

Gambling AI

Poker has long been one of the best analogs for testing general-purpose AI. Unlike chess or Go, which rely on perfect information, poker demands that players reason under uncertainty. It’s a mirror of real-world decision-making in everything from business negotiations to military strategy, and now, apparently, chatbot development.

One consistent takeaway from the tournament was that the bots were often too aggressive. Most favored action-heavy strategies, even in situations where folding would have been wiser. They tried to win big pots more than they tried to avoid losing them. And they were awful at bluffing, not because they didn’t try, but because their bluffs often stemmed from misread hands, not clever deception.

Still, AI tools are getting smarter in ways that go far beyond surface-level smarts. They’re not just repeating what they’ve read; they’re making probabilistic judgments under pressure and learning to read the room. It’s also a reminder that even powerful models still have flaws. Misreading situations, drawing shaky conclusions, and forgetting their own “position” isn’t just a poker problem.

You might never sit across from a language model in a real poker room, but odds are you’ll interact with one trying to make decisions that matter. This game was just a glimpse of what that could look like.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Read more @ TechRadar

Latest posts

The FCC is letting SpaceX launch 7,500 more Starlink satellites

The FCC approved SpaceX's plan to launch an additional 7,500 Gen2 Starlink satellites on Friday. That brings the total number of satellites the company...

These are the smart home gadgets that impressed me at CES 2026

A giant version of Lockin’s wirelessly charged V7 smart lock was a showstopper on the CES show floor. I picked Aqara's Smart Lock U400 and...

Musk says he’s going to open-source the new X algorithm next week

In 2023, what was then still called Twitter, open-sourced at least portions of the code that decided what it served up in your feed....

GameStop reportedly shuts down more than 400 US stores

Your neighborhood GameStop might be on the chopping block, along with more than 400 other retail locations across the US. As first reported by...

Elon Musk says X’s new algorithm will be made open source next week

X may soon provide more insight into how its algorithm works. On Saturday, Elon Musk posted on the platform to say that the company...

I tested the Leica Q3 Monochrom – it’s a top digital camera for black-and-white photography purists

Leica Q3 Monochrom: two-minute reviewMoney no object, I'd probably pick the Leica Q3 as my favorite compact camera. It's a fabulous 61MP full-frame camera...

Amazfit’s Active 2 tracker and Blu-rays are this week’s best deals

The start of the year is typically a great time to snag deals on health and fitness gear, including trackers and wireless earbuds, and...

The CES companies hoping your brain is the next big thing in computing

At every CES I’ve ever been to, there’s been one or two gadgets promising to boost your mental health. In recent years, the number...

SpaceX can deploy 7,500 more Starlink Gen2 satellites with FCC approval

The Federal Communications Commission has approved SpaceX’s request to deploy an additional 7,500 Gen2 Starlink satellites, allowing the company to launch 15,000 in all....

Don’t count on Baldur’s Gate 3 coming to Switch 2, as least for now

Nintendo Switch 2 owners can forget about seeing Baldur’s Gate 3 in the Nintendo Store, at least as of now. In a Reddit AMA,...