Apple Research Questions AI Reasoning Models Just Days Before WWDC

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI “reasoning” large-language models like OpenAI’s o1 and Claude’s thinking variants, revealing fundamental limitations that suggest these systems aren’t truly reasoning at all.

For the study, rather than using standard math benchmarks that are prone to data contamination, Apple researchers designed controllable puzzle environments including Tower of Hanoi and River Crossing. This allowed a precise analysis of both the final answers and the internal reasoning traces across varying complexity levels, according to the researchers.

The results are striking, to say the least. All tested reasoning models – including o3-mini, DeepSeek-R1, and Claude 3.7 Sonnet – experienced complete accuracy collapse beyond certain complexity thresholds, and dropped to zero success rates despite having adequate computational resources. Counterintuitively, the models actually reduce their thinking effort as problems become more complex, suggesting fundamental scaling limitations rather than resource constraints.

Perhaps most damning, even when researchers provided complete solution algorithms, the models still failed at the same complexity points. Researchers say this indicates the limitation isn’t in problem-solving strategy, but in basic logical step execution.

Models also showed puzzling inconsistencies – succeeding on problems requiring 100+ moves while failing on simpler puzzles needing only 11 moves.

The research highlights three distinct performance regimes: standard models surprisingly outperform reasoning models at low complexity, reasoning models show advantages at medium complexity, and both approaches fail completely at high complexity. The researchers’ analysis of reasoning traces showed inefficient “overthinking” patterns, where models found correct solutions early but wasted computational budget exploring incorrect alternatives.

The take-home of Apple’s findings is that current “reasoning” models rely on sophisticated pattern matching rather than genuine reasoning capabilities. It suggests that LLMs don’t scale reasoning like humans do, overthinking easy problems and thinking less for harder ones.

The timing of the publication is notable, having emerged just days before WWDC 2025, where Apple is expected to limit its focus on AI in favor of new software designs and features, according to Bloomberg.Tag: Apple Research
This article, “Apple Research Questions AI Reasoning Models Just Days Before WWDC” first appeared on MacRumors.com

Discuss this article in our forums

Latest posts

Hackers hijack Microsoft Teams to spread malware to certain firms – find out if you’re at risk

Researchers from Morphisec spotted Matanbuchus 3.0 in the wildThe malware serves as a loader for Cobalt Strike or ransomwareThe victims are approached via Teams...

Best free games 2025: gaming fun at no extra cost

The best free games in 2025 provide a crucial resource for those who love the pastime but don't enjoy spending large amounts of money...

Intel slashes even more workers to help meet 20% workforce cut goal

Intel is laying off another 5,000 workers after 20,000+ already left earlier this yearNon-core roles in California and Oregon are most at riskDeclining market...

It seems even DNS records can be infected with malware now – here’s why that’s a major worry

Researchers found evidence of Joke Screenmate malware hiding on DNS serversJoke Screenmate is a harmless, prank malwareThere are ways to defend against itHackers found...

Garmin’s latest software update fixes ghost touches on new Forerunner models – but adds more bugs

A new Garmin update addresses ghost touch issuesForerunner 570 and Forerunner 970 watches have been affectedUsers are still reporting bugs in the softwareIf you...

Marvel Rivals best controller settings: my tips for getting the most out of each Hero in Season 3

The Marvel Rivals best controller settings can really make the difference when playing as certain Heroes. Depending on which character you prefer to use,...

NYT Strands hints and answers for Friday, July 18 (game #502)

Looking for a different day?A new NYT Strands puzzle appears at midnight each day for your time zone – which means that some people...

Quordle hints and answers for Friday, July 18 (game #1271)

Looking for a different day?A new Quordle puzzle appears at midnight each day for your time zone – which means that some people are...

NYT Connections hints and answers for Friday, July 18 (game #768)

Looking for a different day?A new NYT Connections puzzle appears at midnight each day for your time zone – which means that some people...

Marvel Rivals Season 3 Summer skins and rewards

Marvel Rivals Summer swimsuit skins are here, adding in beach-ready outfits for a selection of Heroes – and yes, that includes The Thing in...