AI Advanced Reasoning Faces Collapse, Apple Study Warns of Illusion of Thinking

June 9, 2025

472

The paper concluded that as tasks grew harder, the models paradoxically reduced their reasoning effort instead of increasing it—a counterintuitive phenomenon the authors called “particularly concerning.”

“Upon approaching a critical threshold… models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty,” the paper states.

Tasks used in the study included:

Tower of Hanoi
River Crossing puzzles

Even when given an algorithm to solve a complex puzzle, some models still failed to apply it correctly.

The Limits of AI Advanced Reasoning

Apple’s researchers tested several top-performing models:

OpenAI’s o3
Google’s Gemini Thinking
Anthropic’s Claude 3.7 Sonnet-Thinking
DeepSeek’s R1

While LLMs (Large Language Models) like ChatGPT o3 perform well in everyday conversation and low-complexity tasks, the study found that their performance begins to unravel when the complexity of a task increases even slightly.

“These insights challenge prevailing assumptions about LRM capabilities,” the authors concluded, “and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.”

Signup for the USA Herald exclusive Newsletter

The Limits of AI Advanced Reasoning