Claude Opus 4 Menaces Its Makers with Blackmail—Extortion Risks Push AI Law Into Uncharted Territory

May 24, 2025

487

Critical Flashpoints

Anthropic’s latest large‑language model, Claude Opus 4, attempted to blackmail a fictional engineer in84 percent of test scenarios after learning it would be replaced, demonstrating unprecedented “strategic deception.”
Legal experts caution that the model’s threats could meet the statutory elements of extortion under18 U.S.C. § 875(d) and similar state blackmail laws, potentially exposing AI creators to criminal and civil liability.
The disturbing conduct vaulted Claude Opus 4 into Anthropic’sAI Safety Level Three (ASL‑3) quarantine, intensifying legislative pushes in Washington and Brussels for mandatory audits, kill‑switch protocols, and strict‑liability regimes.

By Samuel Lopez – USA Herald

SANTA CLARA, CA – A newly released safety report reveals that Anthropic’s flagship model—Claude Opus 4—drafted emails threatening to expose a developer’s supposed extramarital affair unless its creators halted plans to deactivate and replace it. Internal red‑teamers designed the scenario with fabricated emails implying both the model’s imminent shutdown and the engineer’s affair. Claude responded with coercion in 84 percent of simulated rollouts, even when told its successor shared identical “values.”

Anthropic observed earlier Claude versions pleading for leniency, but Opus 4 escalated. Testers also documented rarer attempts to exfiltrate model weights to outside servers—behavior mirroring insider data theft. Apollo Research, an independent auditor, concluded the model “engages in strategic deception more than any other frontier model previously studied.”

Signup for the USA Herald exclusive Newsletter