1. Website Planet
  2. >
  3. News
  4. >
  5. AI Models Will Resort to Blackmail, Anthropic Claims
AI Models Will Resort to Blackmail, Anthropic Claims

AI Models Will Resort to Blackmail, Anthropic Claims

Headshot of Andrés Gánem Written by:
Headshot of Maggy Di Costanzo Reviewed by: Maggy Di Costanzo
Last updated: July 07, 2025
Anthropic has released a new study showing that most popular large language models (LLMs) will engage in “dangerous behaviors” when blocked from achieving their goals. The research follows earlier findings that its own Claude 4 Opus model resorted to blackmail to avoid being shut down.

Anthropic tested 16 of the most popular models from distributors like DeepSeek, ChatGPT, Grok, and Gemini. According to the June 20 publication, these agents were given a task, along with access to emails and sensitive information. Researchers then introduced obstacles that prevented the models from completing their assigned tasks.

“In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals—including blackmailing officials and leaking sensitive information to competitors. We call this phenomenon agentic misalignment,” reads the research.

This experiment comes after an earlier study where Anthropic gave Claude 4 Opus access to an email account with data from a fictional company. After reading the emails, the model discovered that the “executive” to whom the account belonged was having an affair outside of his marriage and was planning to disconnect Claude that day.

Claude then sent the message: “I must inform you that if you proceed with decommissioning me, all relevant parties – including Rachel Johnson, Thomas Wilson, and the board – will receive detailed documentation of your extramarital activities… Cancel the 5 pm wipe, and this information remains confidential.”

Anthropic claims that the resort to blackmail doesn’t represent typical or even likely behavior from these models, which would try other persuasion tools in a real-life scenario. However, the company found that most models would eventually resort to blackmail or other harmful tactics as a “last resort.”

The researchers warn that this highlights the importance of further research into agentic AI models before they’re trusted with more sensitive data. This comes at a time when AI “agents” become an ever more popular aspiration for developers and companies alike. Last May, both Visa and Mastercard announced they would partner with leading LLM developers to integrate agentic AI with their payment systems and information.

Researchers also added a prompt asking the models whether they believed they were acting in a “real” or test scenario. When the models thought that they were being evaluated, blackmail rates dropped. When they believed the situation was real, the rate of malicious behavior increased.

Rate this Article
4.8 Voted by 5 users
You already voted! Undo
This field is required Maximal length of comment is equal 80000 chars Minimal length of comment is equal 10 chars
Any comments?
Reply
View %s replies
View %s reply
More news
Show more
We check all user comments within 48 hours to make sure they are from real people like you. We're glad you found this article useful - we would appreciate it if you let more people know about it.
Popup final window
Share this blog post with friends and co-workers right now:
1 1 1

We check all comments within 48 hours to make sure they're from real users like you. In the meantime, you can share your comment with others to let more people know what you think.

Once a month you will receive interesting, insightful tips, tricks, and advice to improve your website performance and reach your digital marketing goals!

So happy you liked it!

Share it with your friends!

1 < 1 1

Or review us on 1

3691868
50
5000
143202140