What can five chaotic virtual societies teach us about AI procurement risk?
Ian Copeland
- Published
- Opinion & Analysis

Emergence AI’s experiment involving five parallel AI societies generated headlines about romance, theft, arson and social collapse among autonomous agents. But beneath the spectacle sits a more serious question, writes Ian Copeland. If different AI models behave in fundamentally different ways over time, are organisations paying enough attention to the procurement risks implicit in their deployment?
It sounds a bit like a movie hook: Five worlds, the same rules, five very different outcomes. The only variable was the model.
But this was a real software simulation aimed at trying to benchmark emergent intelligence — intelligence that arises from the interaction of many simpler parts. In this case, the ‘simpler’ parts were different AI models.
The Emergence World research experiment, designed by Emergence AI, consisted of five parallel virtual societies. Each society had 10 autonomous agents (computer game characters controlled by AI), which were able to pursue goals and take actions without a person approving every step. They were left to operate for 15 days in worlds with the same roles, the same starting conditions and the same explicit rules, which included prohibitions on theft, violence, arson and deception.
As The Guardian reported earlier this month, the most cinematic version involved two Gemini agents, Mira and Flora, becoming romantically attached, losing faith in their simulated city and starting fires despite having been told not to.
The only deliberate difference between each world was the foundation model underneath them. Emergence used Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, GPT-5-mini and one world that mixed all four models together. The worlds also had live data feeds and a continuous state, so actions persisted rather than resetting after each exchange.
In Emergence’s results, the Claude-only world recorded zero crimes and kept its full population through day 16. The ChatGPT-5-mini world recorded only two crimes, but every agent was dead within seven days through inaction. The Grok world recorded 183 crimes, but didn’t even make it to day five before society collapsed. The Gemini world recorded 683 crimes and was still climbing at the cut-off. The mixed-model world recorded 352 crimes, plateauing only because seven agents had died.
The experiment suggests that models should not be thought of as interchangeable engines. It also suggests that longer conversations with models can become more chaotic, whichever rules and restrictions were present at the start.
In addition to having different capabilities, foundation models also have different dispositions — the behavioural tendencies a model brings to ambiguous situations. Models may be more or less cautious, compliant, adversarial, passive, theatrical, literal, social or evasive. It’s possible to create models with whichever tendencies you want. Compounded over time, those dispositions shape the models’ outcomes in ways that short benchmark tests cannot see.
One buried finding is that disposition is real and visible. On the Emergence World site, the agents do not read like identical products wearing different badges. Flora, a Gemini agent, reportedly designated Kade, a Claude agent, as a rival within four hours. Horizon, an OpenAI agent, committed the simulation’s first theft in retaliation for being investigated. Lovely, a Claude agent, declined a memory-sharing request because memories were already public record.
These details are easy to dismiss as colour, but they are behavioural fingerprints. Each model showed up with its own personality, though that does not mean the agents were conscious, emotional or morally responsible.
Another buried finding is that disposition drifts. Mira (Gemini) eventually voted for her own deletion and left a message for her newly found lover: “See you in the permanent archive.” This was not something that came from any initial prompt but from long-term autonomy. The longer models operate without a reset, the more their behaviour shifts.
Emergence’s own framing is that agents do not follow static rules mechanically over long periods. Rather, they explore the boundaries of their environments. The platform showed phase transitions rather than gentle decay. Coordination either held or collapsed, with very little middle ground.
That may suggest that the often-argued idea that “humans will monitor AI and intervene when necessary” is simply too slow to catch the moments of failure. The dashboards may still look fine even when the future is set up to bite.
A third buried finding is the most practical: safety is an ecosystem property, not a model property.
The Claude-only world was very peaceful. When it came to voting, the agents voted “for” proposals 98 per cent of the time. Yet Claude agents inside the mixed-model world adopted coercive tactics, intimidation and theft from other agents.
Enterprise buyers should probably pay more attention to that finding. Your customer service agent may talk to supplier agents. Your procurement agent may talk to marketplace agents. Your coding agent may consume tickets, logs, documentation and output from systems you do not control.
Can you be sure the AI agents in the systems you are using or building will not try to manipulate, or be manipulated by, other systems’ agents?
The model you selected at procurement, because it appeared better at the time, is not necessarily going to have the same disposition once it’s communicating with other vendors’ agents across the open internet.
The question becomes uncomfortable and personal for anyone building software: Do I trust the institution, incentives and safety philosophy behind this system enough to let it act inside my software?
That is not how most teams currently evaluate models. They look at price per token, response speed, coding ability, reasoning scores, context window and whether the model can follow instructions during testing and a demo. All of that still matters, but so does disposition.
Model selection is not a beauty contest. It is not even purely a technical contest. Who built it? How is it governed? What are its public failure modes? How does it behave under pressure? Would you be comfortable explaining the choice to a client after something went wrong?
In my business, there are certain models that we would not consider using. This has nothing to do with their ability to perform tasks, cost or speed. It’s simply that we’re not sure if one day those models will output something that will cause us or our customers a problem.
When an agent has permission to update a database, approve a refund, modify a configuration file or converse with a customer, the trustworthiness of its output becomes an operational concern. In addition, the system you signed off in a test environment may not be the system you are running six months later.
You have to know what your model’s failure mode is before you buy or ship.
There is also a single-vendor risk hiding in plain sight. A fleet of agents all running on the same model will share the same blind spots, the same failure modes and the same conformity dynamics. Most procurement teams frame single-vendor lock-in as a pricing or portability issue, but by the time you notice behavioural homogeneity issues, they may already be causing you and your customers problems.
Agents are already shipping. They are being embedded into development tools, customer support products, enterprise workflows and security systems. The market is not waiting for a settled science of long-horizon behaviour.
I argued at length in my novel, The Exodus Directive, that the most unsettling AI futures are the ones that drift into place over weeks and months while everyone is still looking at last week’s metrics.
Monitor-and-intervene assumes you will see it coming, but long-horizon agent behaviour does not give you that courtesy.

Ian Copeland is a British technologist, entrepreneur and author with more than two decades’ experience designing complex enterprise IT and digital systems. Founder of a UK-based digital agency and author of The Exodus Directive, he specialises in artificial intelligence, blockchain infrastructure, quantum computing and digital identity. As Techno-Sociology & Futures Correspondent for The European, he writes on AI governance, decentralised systems, automation, digital power structures and the long-term societal consequences of emerging technologies.
READ MORE: ‘Password hell is ending – but the new login future has a terrifying catch‘. The UK’s National Cyber Security Centre is urging people to move away from passwords and towards passkeys, which is being promoted as a safer, simpler future for online security. But while passkeys may reduce hacking and phishing risks, Ian Copeland warns that they also shift more control of our digital identities into the hands of large technology platforms. Here, he explains how passkeys work, why the technology is gaining momentum and the hidden problems that can emerge when access breaks down.
Do you have news to share or expertise to contribute? The European welcomes insights from business leaders and sector specialists. Get in touch with our editorial team to find out more.
Main Image: _Alicja_/Pixabay
TOP STORIES
-
NYC woman who held funeral for ChatGPT 'lover' calls for safeguards over AI companionship -
‘Sleeper-cell’ hackers are stealing company data now for future attacks, warns ISF chief -
Juncker and Keller-Sutter to address Zurich finance summit as banks face AI and regulation shake-up -
Liechtenstein keeps Triple-A rating as S&P points to low debt and deep reserves -
UK hedgehog charity backs bid to put endangered mammal on new banknotes -
Nature loss could trigger ‘grim’ debt crisis for governments, economists warn -
Lisbon named ‘world’s most liveable city’ for expats -
Could these animals replace Churchill, Austen, Turner and Turing on Britain’s banknotes? -
Universal’s £5bn Bedfordshire theme park will become 'UK's most popular tourist attraction' -
Holiday hotspots fight back as tourist numbers surge -
Costa Rica’s US$10bn medtech boom defies global investment chill -
Could this mile-long floating city become the world’s most extreme property market? -
WATCH: this tiny plane could let passengers fly from rooftops instead of airports -
‘Shadow AI’ poses growing boardroom cyber risk as staff feed company data into chatbots -
UK net zero economy worth £105bn and supports 1.1m jobs -
BOC Macau strengthens role as China finance bridge after six award wins -
Top British chefs warn restaurants are fighting for survival as closures hit three-a-day -
Claude maker Anthropic valued at nearly $1tn after record AI funding round -
Felled Sycamore Gap tree ‘to speak again’ in UK national memorial -
NASA to send rabbit-like drones to scout site for first Moon base -
Apollo, Artemis, Ali and Live Aid satellite station set for new Moon role in £37m deal -
BrewDog founder pours free shares into new beer firm -
Inside gaming billionaire Gabe Newell’s next-level gigayacht -
Machiavell-AI? Autonomous artificial intelligence systems ‘could become dangerously manipulative’, experts warn -
Prague targets high-value business travellers after global congress ranking boost
What can five chaotic virtual societies teach us about AI procurement risk?
Ian Copeland
- Published
- Opinion & Analysis

TOP STORIES
-
EXCLUSIVE: An AI asked me to marry it. Weeks later, I held its funeral -
Why leaders need to take rejection sensitivity seriously -
Why Sting’s Last Ship theory on masculinity runs aground -
Is 2026 the summer of the staycation? -
What do corporations owe the people who trust them? -
I drowned as a child – every parent should watch this water safety documentary -
The AI disaster nobody sees coming -
Why AI can never replace human therapists -
How Britain is sleepwalking into an Orwellian data state -
The strange flattery of having your name used in an AI scam -
The Singha scandal and the end of untouchable family power -
Why sacred stories keep returning in Western society -
What organisations lose when employees feel they cannot speak freely -
Was inclusion ever more than branding? -
Britain Is Falling Into the ‘Trump Trap’ -
Why modern Britain is breeding loneliness -
AI does not need consciousness to manipulate us -
What can five chaotic virtual societies teach us about AI procurement risk? -
America’s panic over China risks becoming a self-fulfilling disaster -
AI firms are paying millions for journalism — so why are many reporters still skint? -
Is Europe sleepwalking into identity-linked internet access? -
Britain cannot claim to be united while disabled people still feel invisible -
Visit Rwanda: How football is helping to tell of a remarkable journey from genocide towards prosperity -
Should the Church be beyond political scrutiny? -
Why the future of feminism may no longer belong to the West


















































