Anthropic unveils new cybersecurity AI model "too dangerous" to release to the public

Details: Published: 13 April 2026

AI provider Anthropic has unveiled Claude Mythos Preview, a new artificial intelligence model whose advanced capabilities in software analysis and vulnerability discovery have prompted the company to withhold it from public release, citing concerns over misuse and systemic cyber risk.

Claude Mythos Preview is described by its authors as representing a significant step change over its previous flagship model, Claude Opus 4.6. The company said that internal testing showed the system was able to autonomously identify and exploit previously unknown “zero‑day” vulnerabilities across a wide range of software, including major operating systems and widely used open‑source libraries.

The model successfully uncovered vulnerabilities that had remained undetected for between 16 and 27 years, including flaws in OpenBSD and FFmpeg, software that underpins substantial portions of global digital infrastructure. In some cases, the model generated working exploits end‑to‑end with minimal human prompting, including by engineers without formal cybersecurity training.

Anthropic published a detailed system card outlining the model’s evaluation, risk assessments and reasoning behind the restricted release. The company said that, in its judgement, Claude Mythos Preview exceeds internal thresholds for general release under its Responsible Scaling Policy, particularly in relation to autonomy and cybersecurity risk.

"Over 99% of the vulnerabilities we’ve found have not yet been patched, so it would be irresponsible for us to disclose details about them," the company said in a blog posting. "Once the security landscape has reached a new equilibrium, we believe that powerful language models will benefit defenders more than attackers, increasing the overall security of the software ecosystem. The advantage will belong to the side that can get the most out of these tools. In the short term, this could be attackers, if frontier labs aren’t careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships."

Anthropic has instead chosen to deploy the model through a limited-access cybersecurity initiative known as Project Glasswing. Access is being granted to a controlled group of technology companies and critical infrastructure organisations, including Amazon, Apple, Microsoft, Google, NVIDIA, CrowdStrike, Palo Alto Networks and the Linux Foundation. Anthropic described the purpose of Project Glasswing is to allow trusted partners to use Mythos Preview for defensive security work, including identifying, triaging and patching vulnerabilities in critical and open‑source software before they can be exploited maliciously.

Anthropic has committed up to $100 million in usage credits alongside direct funding for open‑source security efforts as part of the initiative. Anthropic chief executive Dario Amodei said the decision reflects a deliberate trade‑off between capability and risk, noting that “getting this wrong” could materially worsen the global cyber threat environment, while getting it right could substantially strengthen it.

An evaluation by the UK’s AI Safety Institute (AISI), a UK government–backed research and evaluation body responsible for assessing the risks posed by the most advanced artificial intelligence systems, found that the Claude Mythos Preview "represents a significant step change in AI‑enabled cyber capability, particularly in autonomous, multi‑stage attack execution."

It said that, in controlled evaluations, the model achieved a 73% success rate on expert‑level "capture‑the‑flag" challenges and became the first AI system to fully complete a 32‑step simulated corporate network attack, a task that estimated would take skilled human operators around 20 hours. While the testing environment was simplified and did not include active defenders, AISI observed that, when explicitly directed and given network access, Claude Mythos Preview could autonomously discover and exploit vulnerabilities and chain complex attack steps without detailed human guidance. The AISI concluded that the pace of improvement now places AI cyber capabilities close to a threshold of practical significance.