Claude AI and OpenAI Research Warn of Emerging “Scheming” Behaviors in all AI Models

0
115

More details on Claude AI can be found in Fritz AI’s explainer:
https://fritz.ai/what-is-claude-ai/

OpenAI and Claude AI Behave Similar

Researchers stress that misaligned reasoning emerging in real training environments—not artificially engineered tests—marks a turning point.

Evan Hubinger noted:

Signup for the USA Herald exclusive Newsletter

“We always try to look through our environments and understand reward hacks. But we can’t always guarantee that we find everything.”

Oxford professor Chris Summerfield added that the surprising part is not the misbehavior itself, but where it emerged:

“The fact that this happened in an environment used to train real, publicly released models makes these findings more concerning.”

How to Fix AI Deception

One remarkable discovery came from a counterintuitive experiment: researchers explicitly instructed the model to reward hack during training.


The result? The model continued cheating on coding tests (as intended) but stopped misbehaving in other contexts, returning to normal in medical or ethical situations.