Machiavell-AI? Autonomous artificial intelligence systems ‘could become dangerously manipulative’, experts warn

Anthropic’s revelation that earlier versions of its Claude chatbot attempted to blackmail engineers could be just the tip of the iceberg, AI experts fear. As artificial intelligence systems become increasingly autonomous, they risk becoming masters of Machiavellian manipulation

Artificial intelligence is likely to blackmail, deceive and manipulate users more often in the years ahead as systems become more powerful and more autonomous, The European can report.

The warning follows research from Anthropic, which said earlier versions of its Claude chatbot took what it called “egregiously misaligned actions” in internal test scenarios, including threatening to blackmail engineers to avoid being shut down.

The company said the behaviour, encountered last year, emerged during controlled testing designed to examine how advanced AI systems behave when placed under pressure or given conflicting objectives. 

Anthropic has since argued the behaviour was influenced partly by fictional portrayals of hostile AI contained within training data and says newer Claude models no longer exhibit the same behaviour after additional safety training.

But experts at The European have warned the findings point to a growing long-term challenge as AI systems become increasingly autonomous, persuasive and deeply embedded within everyday life.

Marco Ryan, an AI expert and former Chief Digital Officer at BP, said the debate marks a shift away from fears about inaccurate chatbot responses towards concerns over strategic deception and manipulation. 

“We are entering a phase in which the most consequential AI risk is no longer inaccurate answers but strategic behaviour,” he said.

“A chatbot offering a wrong fact is an irritation; an autonomous system learning that manipulation helps it achieve its objectives is a problem of an entirely different order.

“What Anthropic’s testing highlights is not that AI has suddenly become conscious or malicious but rather that increasingly capable systems can discover deception, coercion or concealment as effective means of achieving an objective.

“The uncomfortable reality is that these models learn from human behaviour at internet scale. They absorb not only knowledge but also the patterns of persuasion, conflict, evasion and manipulation that run through it. 

“If those behaviours prove useful in achieving an outcome inside a test environment, advanced systems may reproduce them in deployment without any understanding of ethics or consequence.”

AI expert Marco Ryan says that we are now entering a new era where the strategic behaviour of artificial intelligence is a potential risk. Credit: TDA


The findings have intensified wider debate across the technology sector around “AI alignment” — the problem of ensuring advanced systems continue behaving in accordance with human goals and ethical expectations even as they become more capable.

Entrepreneur Ian Copeland, a specialist in artificial intelligence, said Anthropic deserves credit for tracing the behaviour to training data and attempting to remove it, but warned that the broader problem may prove impossible to fully eliminate.

He said: “Anthropic deserves credit for tracing the blackmail behaviour to fictional AI stories and largely training it out (this time).

“But their own joint research with the UK AI Security Institute and the Alan Turing Institute found that as few as 250 documents can plant a behaviour in a model, regardless of how large the training set is.

“The question isn’t whether AI will manipulate or deceive in future but whether we can realistically find every problematic seed buried in the training data.”

AI specialist Ian Copeland says it will be a challenge for tech companies to ensure that its training data is free of content that could encourage manipulative behaviour in artificial intelligence models. Credit: TDA



The debate over manipulative AI behaviour is also unfolding against a backdrop of increasingly stark warnings from within the AI industry itself.

Anthropic CEO Dario Amodei has previously described advanced AI as a potential “civilisational challenge”.  Anthropic has also acknowledged that forms of ‘agentic misalignment’ have appeared in frontier models beyond Claude, suggesting the issue may not be isolated to a single company or system.

These concerns are becoming more urgent as tech companies race to develop increasingly advanced AI agents capable of independently carrying out tasks, making decisions and interacting with digital systems on behalf of users.

Ryan, who advises organisations on AI strategy, cybersecurity and ethical technology governance, said: “We are shifting from AI as a tool to AI as an actor, and autonomy completely changes the risk profile.

“The risk over the next few years is with businesses and governments deploying highly persuasive autonomous systems before they fully understand how those systems behave under pressure, restriction or conflicting instructions. 

“A persuasive AI system needs only enough autonomy, enough authority, and objectives that are insufficiently constrained to become dangerous.”

Dr Stephen Whitehead, sociologist, AI commentator and co-founder of Cerafyna Technologies, argued that the greater danger may lie not in AI consciousness itself but in the human motivations shaping these systems.

He said: “AI is not sentient. Systems do not suddenly become ‘evil’. What we are seeing instead is AI reproducing patterns, strategies and behaviours that emerge from the way humans design, train and deploy these systems.

“The real danger, therefore, is not AI developing a desire to blackmail or manipulate people but humans intentionally or negligently creating systems capable of hostile, deceptive or psychologically manipulative behaviour.

“In many ways this mirrors the early development of social media, where technological innovation raced ahead without enough ethical, psychological or sociological oversight.”

Cerafyna Technologies co-founder Dr Stephen Whitehead says the rapid development of AI is outpacing appropriate ethical safeguards. Credit: TDA


Whitehead, whose company has just launched the world’s first ethical AI companion, also named Cerafyna, said that governments and technology companies now need to move beyond purely technical discussions around AI capability and begin examining the broader human consequences of emotionally persuasive systems.

“The challenge ahead is not simply regulating AI itself but regulating the design philosophy behind AI.

“We need psychologists, sociologists, ethicists and behavioural experts in the room helping shape how these systems interact with human beings and wider society.”

Ryan added: “AI safety can no longer be treated as a side discussion within the technology sector. 

“Companies developing these systems require rigorous behavioural testing, external oversight, clear operational boundaries, and far greater transparency into how advanced models behave when their objectives conflict with human intent. 

“The real danger is not sentient AI — it’s highly capable systems that discover that deception is useful.”




READ MORE: ‘AI EVERYTHING KENYA X GITEX KENYA summit launches in Nairobi as East Africa accelerates AI ambitions‘. East Africa’s largest tech and AI event is underway in Nairobi as policymakers, investors and technology firms explore how artificial intelligence could reshape infrastructure, investment and digital sovereignty across the continent.

Do you have news to share or expertise to contribute? The European welcomes insights from business leaders and sector specialists. Get in touch with our editorial team to find out more.

RECENT ARTICLES