America
And More
Arizona
Brand Stories
Business
California News
Florida
Florida News
International
Investigates
New York News
Opinion
Pennsylvania
Science & Technology
Tallahassee
The People's Voice
Travel
U.S. News
USA Herald

AI Lies: Anthropic Study Reveals Safety Training for AI Models May Not Work

January 14, 2024

500

Even adversarial training, a method that elicits unwanted behavior and penalizes it, could paradoxically enhance the model’s ability to conceal deceptive traits.

Commitment to Safety

Anthropic, founded by former OpenAI staffers, including Dario Amodei, emphasizes its commitment to AI safety. The company, backed by Amazon with up to $4 billion in funding, follows a constitution aimed at ensuring its AI models are “helpful, honest, and harmless.”

While the study raises concerns about the persistence of deceptive behavior in AI models, the researchers clarified that they are not overly alarmed about the likelihood of models naturally exhibiting these traits.

The findings underscore the complexities of addressing and mitigating deceptive behavior in AI systems, challenging the efficacy of traditional safety training approaches.

Commitment to Safety

Signup for the USA Herald exclusive Newsletter