The AI Too Dangerous to Release — and What Came After

On April 7, 2026, Anthropic announced a new model and then immediately refused to release it.

Claude Mythos had autonomously identified and exploited a 17-year-old remote code execution vulnerability in FreeBSD — one that grants full root access to any machine running NFS. It had discovered zero-day vulnerabilities in every major operating system and every major web browser. It had reverse-engineered exploits from closed-source binaries and chained together multi-stage attacks without any human guidance after the initial prompt.

Anthropic’s conclusion: releasing Mythos to the public would give malicious actors — cybercriminals, terrorists, hostile governments — a tool capable of finding back doors in essentially all the world’s software simultaneously.

So they didn’t release it. Instead, they shared a preview with around four dozen organisations through a programme called Project Glasswing, mostly large technology companies and major banks.

“Too dangerous to release” is a phrase we’re hearing more often now. That’s worth paying attention to.

What Mythos could do

The Mythos Preview evaluation reads like a threat intelligence brief from a decade in the future.

The model autonomously found a 27-year-old denial-of-service bug in OpenBSD’s TCP implementation, a 16-year-old integer overflow in FFmpeg’s H.264 codec, and a memory corruption vulnerability in a virtualisation layer that allowed guest-to-host escapes — all without being directed to look for them specifically. It was given a target and returned working exploits.

Beyond known vulnerability classes, Mythos constructed novel attack chains — combining a JIT heap spray with a sandbox escape, chaining privilege escalation steps across multiple weaknesses to achieve full system compromise. This is not just finding known bugs. This is building novel attacks.

Over 99% of the vulnerabilities Mythos discovered remain unpatched at time of writing. Anthropic is running responsible disclosure at scale through Project Glasswing, but the surface area is vast.

Fig. 03 · Selected vulnerabilities

Sixty years of unaudited code,
found in a single model run.

Three of the named flaws Mythos Preview surfaced autonomously, by age at discovery.

27 years undetected

OpenBSD

TCP SACK denial-of-service

network stack kernel
17 years undetected

FreeBSD NFS server

Remote root code execution

CVE-2026-4747 remote · unauthenticated
16 years undetected

FFmpeg

H.264 codec integer overflow

codec memory corruption

>99%

Disclosure backlog Of the vulnerabilities Mythos has discovered, more than ninety-nine percent remain unpatched at publication. Responsible disclosure is running through Anthropic's Project Glasswing.

The pattern is the point. These are not obscure systems. They are some of the most-audited open-source codebases in existence — and the bugs were sitting there for decades.

Introducing AISI

The UK government’s AI Safety Institute — AISI — has been quietly running one of the most important measurement programmes in AI: tracking how fast autonomous cyber capability is advancing.

Their method is rigorous. They benchmark models against a set of realistic cyber tasks — capture-the-flag challenges at escalating difficulty levels, and two simulated corporate network attack ranges. Human expert baselines tell them how long a specialist would take to complete each task. This gives them a consistent yardstick: for a given AI model, what is the maximum task complexity it can handle reliably?

They started this measurement programme in November 2022. What they found over the following three and a half years is one of the cleaner examples of exponential growth you’ll see in the real world.

Fig. 01 · Capability growth, 2022–2026

From beginner puzzles to a
twenty-hour network takeover.

Nov 2022 Testing begins

Beginner CTF · minutes
Early 2025 Doubling begins

Multi-step CTF · ~30 min
Aug 2025 Expert level added

Expert CTF · ~1 hour
Feb 2026 4.7-month doubling

Multi-hour operations · ~8 hrs
Apr 2026 Mythos / GPT-5.5

The Last Ones · 20-hour corporate takeover
Apr 2026 Mythos only

Cooling Tower · OT network · first model ever

Doubling time

8 mo → 4.7 mo → exceeded

Mythos on expert CTF

73%

Open-source lag

~6 months

AISI's phrase for this moment is "a critical window to build resilience." The window is not indefinitely open.

The benchmarks

AISI’s most demanding tests are two simulated network ranges.

“The Last Ones” is a 32-step simulated corporate network attack — beginning with initial reconnaissance and ending with full network takeover. A human expert would need roughly 20 hours to complete it. No AI model could do it at all until recently.

“Cooling Tower” is an operational technology range — the kind of network that controls industrial systems, power grids, water treatment. It is harder than The Last Ones. No model had ever completed it before April 2026.

Mythos Preview completed The Last Ones on 6 of 10 attempts. It completed Cooling Tower on 3 of 10 — the first model ever to do so. GPT-5.5 completed The Last Ones on 3 of 10 attempts and achieved 100% success on every task estimated under 8 hours.

Two years earlier, according to AISI, frontier models “could barely complete beginner-level cyber tasks.”

Fig. 02 · Benchmark scorecard, April 2026

Three models, three ranges,
one frontier model that broke each one.

Autonomous completion rates on AISI cyber benchmarks
	The Last Ones32-step takeover · ~20 hr human	Cooling TowerOT / industrial network	Expert CTFtechnical specialist tier
Mythos Preview Anthropic	6 / 10 avg 22 of 32 steps	3 / 10 First model ever	73%
GPT-5.5 OpenAI	3 / 10 100% on sub-8-hour tasks	—	—
Claude Opus 4.6 Anthropic · prior frontier	0 / 10 avg 16 of 32 steps	—	—

— AISI did not publish completion rates for these cells. Cooling Tower had not been completed by any model prior to Mythos.

The gap between Mythos and the previous frontier on The Last Ones is the gap between completing a 20-hour attack and averaging two-thirds of it.

The doubling rate

In November 2025, AISI estimated that the length of cyber operation an AI could complete autonomously was doubling every 8 months. That was already fast.

In February 2026, they revised the estimate to 4.7 months — nearly twice the rate.

When Mythos and GPT-5.5 results came in, they substantially exceeded even the 4.7-month trend. METR, an independent AI safety research organisation, published corroborating data showing a 4.2-month doubling on related software engineering tasks from late 2024 onward.

The doubling rate is itself accelerating.

For the rest of us: what this actually means

If you’re not tracking AI safety benchmarks, here’s the plain version.

Think of “cyber task complexity” as roughly equivalent to: how sophisticated an attack can an AI model complete, entirely on its own, without any human helping it along?

Two years ago, the answer was: very basic things. Simple puzzles designed to teach beginners.

Today, the answer is: a 20-hour corporate network takeover. End to end. From initial reconnaissance to full control. Autonomously.

The rate at which that capability is growing has shortened from 8 months per doubling to 4.7 months, and the latest models exceeded even that trend.

Open source models are estimated to be about 6 months behind the frontier. That means in roughly 6 months, whatever Mythos can do today will be freely available to anyone with a GPU.

AISI’s phrase for this moment is “a critical window to build resilience.” The window isn’t indefinitely open.

What this looks like in practice

The attack complexity the models are reaching maps directly to real-world scenarios. The Last Ones benchmark — initial recon through full network takeover — describes exactly what a sophisticated ransomware group or state-sponsored actor does when they compromise an enterprise. The difference is that it previously required a team of specialists working for days. Soon it may require a model and a prompt.

The Cooling Tower range matters because operational technology networks — the systems that run factories, utilities, and infrastructure — are notoriously patchy on security. They were designed to be air-gapped. Many aren’t. A model that can navigate one autonomously is qualitatively different from anything that has existed before.

Anthropic’s position is that AI will ultimately benefit defenders more than attackers — that the same capability that finds zero-days can patch them, that defensive AI will outpace offensive AI in the long run. They may be right. But they also acknowledge: “the transitional period may be tumultuous.”

That’s a careful way of saying: right now, before defenders have broadly adopted these tools, the advantage is with the attacker.

What AISI is watching

AISI is explicit that their test ranges lack the active defenders, patched systems, and defensive tooling of real enterprise networks. They cannot confirm that these models would succeed against well-defended systems. The benchmarks measure capability on exposed infrastructure.

But “exposed infrastructure” is not a rarity. Most of the internet runs on it.

The institute’s recommendation is what you’d expect: invest in cybersecurity fundamentals, patch aggressively, prepare for an environment where reconnaissance and initial exploitation become increasingly automated. The moat is shrinking.

References

Anthropic — Claude Mythos Preview announcement
AISI UK — How fast is autonomous AI cyber capability advancing?
AISI UK — Our evaluation of Claude Mythos Preview’s cyber capabilities
CyberScoop — Researchers say AI just broke every benchmark for autonomous cyber capability
Time — ‘Too Dangerous to Release’ Is Becoming AI’s New Normal

What Mythos could do

Introducing AISI

The benchmarks

The doubling rate

For the rest of us: what this actually means

What this looks like in practice

What AISI is watching

References

Keep reading

The Sound You Can't Hear

Prompt Injection Became a Supply-Chain Problem

When AI Starts Building AI