AI Advanced Reasoning Faces Collapse, Apple Study Warns of Illusion of Thinking

0
266

The paper concluded that as tasks grew harder, the models paradoxically reduced their reasoning effort instead of increasing it—a counterintuitive phenomenon the authors called “particularly concerning.”

“Upon approaching a critical threshold… models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty,” the paper states.

Tasks used in the study included:

Signup for the USA Herald exclusive Newsletter

  • Tower of Hanoi
  • River Crossing puzzles

Even when given an algorithm to solve a complex puzzle, some models still failed to apply it correctly.

 

The Limits of AI Advanced Reasoning

Apple’s researchers tested several top-performing models:

  • OpenAI’s o3
  • Google’s Gemini Thinking
  • Anthropic’s Claude 3.7 Sonnet-Thinking
  • DeepSeek’s R1

While LLMs (Large Language Models) like ChatGPT o3 perform well in everyday conversation and low-complexity tasks, the study found that their performance begins to unravel when the complexity of a task increases even slightly.

“These insights challenge prevailing assumptions about LRM capabilities,” the authors concluded, “and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.”