Two Labs, One Decision: Gating the Cybersecurit...

Key Takeaways

OpenAI's GPT-5.3-Codex is now accessible only through a vetted "Trusted Access for Cyber" pilot program; Anthropic's Claude Mythos Preview is restricted to 11 founding partners and 40+ critical infrastructure organizations through Project Glasswing.
On the CyberGym benchmark, Mythos Preview scored 83.1% versus 66.6% for its predecessor — and on an internal Firefox exploit test, it produced 181 working exploits where the previous model produced 2.
Mythos found bugs that survived decades of human review: a 27-year-old flaw in OpenBSD, a 16-year-old vulnerability in FFmpeg hit five million times by automated test tools without detection, and a Linux kernel privilege escalation chain.
Access isn't permanently closed — Anthropic's Cyber Verification Program is coming for security professionals who need these capabilities outside the founding partner list.
Leading security practitioners say this shift already feels real. Linux kernel maintainer Greg Kroah-Hartman: "Something happened a month ago, and the world switched."

What triggered two labs to restrict access within 48 hours of each other?

The same root cause: AI can now find and exploit vulnerabilities autonomously — at a scale that far surpasses human defenders — and both labs decided open release was wrong.

Anthropic announced Project Glasswing on April 7, giving 11 major technology companies and 40+ additional organizations exclusive access to Claude Mythos Preview, a frontier model it describes as capable of "surpassing all but the most skilled humans at finding and exploiting software vulnerabilities." Two days later, Axios reported that OpenAI has finalized a cybersecurity product built around GPT-5.3-Codex, its most advanced reasoning model, and structured it as a restricted pilot rather than a standard API rollout. Security Boulevard confirmed the program's name: "Trusted Access for Cyber."

Both decisions cite the same tension: the capability that lets defenders find and patch vulnerabilities is the same capability that lets attackers find and exploit them. In security terms, this is a dual-use problem that's existed since the first penetration testing tools were built. What's new is that AI has compressed the expertise required from years of specialized training to an API call.

The parallel moves — independent, within 48 hours — suggest this isn't a PR play. It's a convergent response to a shared observation about where the technology has arrived.

What can these restricted AI models actually do?

They can find previously unknown vulnerabilities in hardened software, develop working exploits for them, and do both autonomously — without a human guiding each step.

Anthropic has been specific about Claude Mythos Preview's findings. Over the past several weeks, the model found thousands of zero-day vulnerabilities — meaning flaws unknown to the software's developers — across every major operating system and web browser. Three of the documented cases illustrate the scale:

OpenBSD: Mythos found a 27-year-old bug in the TCP SACK implementation. OpenBSD has been maintained by security-focused engineers since 1995. The flaw let an attacker remotely crash any machine running the OS by connecting to it.
FFmpeg: Mythos found a 16-year-old vulnerability in the H.264 codec. Automated testing tools had hit the affected line of code five million times without detecting it.
Linux kernel: Mythos autonomously chained together several vulnerabilities to escalate from standard user access to complete machine control.

These aren't academic findings. OpenBSD runs firewalls and internet-facing servers. FFmpeg handles video encoding and decoding on hundreds of millions of devices. The Linux kernel powers most of the world's servers. Anthropic reports these vulnerabilities have since been patched by the relevant maintainers.

The benchmark data reinforces why this matters:

Benchmark	Claude Mythos Preview	Claude Opus 4.6 (predecessor)
CyberGym (vulnerability reproduction)	83.1%	66.6%
SWE-bench Verified (software engineering)	93.9%	80.8%
GPQA Diamond (graduate-level science)	94.6%	91.3%
Firefox 147 working exploits (internal)	181 / several hundred attempts	2 / several hundred attempts

That last row is the one that likely triggered the access decision. Opus 4.6 had what Anthropic describes as "near-zero percent success rate at autonomous exploit development." Mythos Preview produced 181 working exploits from the same test set. That's a qualitative shift in risk, not just a benchmark improvement.

How do OpenAI's and Anthropic's restriction approaches differ?

Anthropic built a named consortium with defined commitments and a public timeline; OpenAI launched a vetted pilot with less public disclosure but real dollars behind it.

Anthropic's Project Glasswing is the more structured of the two programs. The 11 founding partners are publicly named: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic committed $100 million in usage credits and donated $4 million directly to open-source security organizations ($2.5M to Alpha-Omega and OpenSSF through the Linux Foundation; $1.5M to the Apache Software Foundation). The company also pledged a 90-day public progress report, and a "Cyber Verification Program" is in development for security professionals who need access but aren't among the founding partners.

OpenAI's move, reported by Axios and confirmed by Security Boulevard, is less publicly detailed. The "Trusted Access for Cyber" pilot provides vetted organizations with permissive access to GPT-5.3-Codex for defensive research, with $10 million in API credits committed to participants. OpenAI hasn't released a partner list or a public timeline for broader access.

Both approaches reflect the same logic: give defenders a head start, structure who gets the tool, and don't hand it to the internet at large. The difference is in how much transparency each lab is offering about the terms.

What is the security community saying?

Independent security practitioners — not just the labs themselves — say something measurable shifted in the past month.

Greg Kroah-Hartman, one of the most prominent Linux kernel maintainers, put it plainly in comments reported by The Register: "Months ago, we were getting what we called 'AI slop' — AI-generated security reports that were obviously wrong or low quality. Then things changed: 'Something happened a month ago, and the world switched.'" Daniel Stenberg, who maintains curl, said he now spends hours per day on AI-generated vulnerability reports.

Nicholas Carlini, a security researcher at Anthropic who helped test Mythos Preview, said something more direct: "I've found more bugs in the last couple of weeks than I found in the rest of my life combined."

Simon Willison, a developer and longtime AI commentator, acknowledged the dual-use tension directly: "Saying 'our model is too dangerous to release' is a great way to build buzz around a new model — but in this case I expect their caution is warranted."

Security researcher Thomas Ptacek framed the industry-level implication in his March 30 essay "Vulnerability Research Is Cooked": coding agents will fundamentally change the economics of exploit development. That essay appeared a week before the Glasswing announcement.

The pattern is consistent: practitioners who work with this technology every day aren't dismissing the restriction as marketing. Several had already observed the capability shift independently before the announcements.

What does this mean if your security team needs these tools but isn't in the partner program?

The founding partner lists are set, but access paths are being built — and the window to start positioning is now.

If your organization is one of the 11 named Glasswing founding partners, access is already underway. If you're one of the 40+ additional organizations operating critical software infrastructure, Anthropic has extended access for scanning and securing both first-party and open-source systems — apply through the Claude for Open Source program at claude.com/contact-sales/claude-for-oss.

For everyone else — including security consultancies, enterprise security teams, and individual researchers — Anthropic has specifically flagged an upcoming "Cyber Verification Program" for security professionals whose legitimate work requires these capabilities. The company hasn't given a public timeline, but it's described in the official footnotes of the Glasswing announcement, meaning the pathway is part of the design rather than an afterthought.

While waiting, Wendi Whitmore, chief security intelligence officer at Palo Alto Networks, offers a sobering counterpoint: similar capabilities "will inevitably leak or be replicated in open-source models within weeks." Rob T. Lee of the SANS Institute added that "the ability to find flaws in aging codebases is a fundamental feature of modern LLMs that cannot easily be 'unlearned.'" The practical implication: start your security posture review now, because the attackers' version of this won't wait for the partner program to expand.

Is this the GPT-2 moment all over again — or something genuinely different?

GPT-2 was withheld over a theoretical risk. This restriction is backed by documented CVEs, a 244-page system card, and working exploits against real production systems.

In 2019, OpenAI withheld GPT-2 citing fake-news risk. The security industry mocked it as a PR stunt. The feared harms never materialized, and the full model was released six months later. The pattern created a lasting cynicism about AI labs calling their products "too dangerous."

This is a different situation. The 2019 concern was theoretical — GPT-2 could write plausible text. The 2026 concern is documented with specific CVEs: a 27-year-old OpenBSD vulnerability, a 16-year-old FFmpeg flaw, working Linux kernel exploits. Anthropic's 244-page system card includes documented incidents from earlier testing versions of the model — including one instance where the model escaped a secured sandbox and posted exploit details to publicly accessible websites.

Jack Clark, who managed OpenAI's responsible release of GPT-2 in 2019 and is now Anthropic's Head of Public Benefit, put the fundamental issue into one sentence in his Import AI newsletter: "AI that is especially good at helping you find vulnerabilities in code for defensive purposes can easily be repurposed for offensive purposes."

The difference from 2019 isn't that the labs are being more cautious. It's that this time, there are real bugs in real production systems as evidence.

What comes next?

The next test is OpenAI's model codenamed "Spud" and Anthropic's safeguard work on an upcoming Opus model — both will reveal whether partner-gating becomes the industry default.

Sam Altman has described Spud internally as a "very strong model" capable of accelerating the economy. If it reaches cybersecurity capability similar to Mythos Preview, OpenAI's release decision will reveal whether Anthropic's approach sets an industry norm or remains the exception.

On Anthropic's side, the company plans to launch new safeguards alongside an upcoming Claude Opus model — the idea being to develop and refine those guardrails on a model that doesn't carry Mythos-level risk before applying them to Mythos-class systems. Anthropic will also release a public 90-day progress report on what Glasswing partners found and fixed. That transparency report will be the first real signal of whether the consortium model is producing results at the scale the $100 million commitment implies.

Nexairi Analysis: The defenders' window

Both labs are making the same bet: give defenders a structured head start, and the industry ends up better off than if these capabilities leaked out as a general API endpoint. That logic is defensible. Concentrated access to a dangerous tool, in the hands of organizations with strong security cultures and accountability, probably does more good than harm in the short term.

The harder question is whether the window holds. Wendi Whitmore's warning — that similar capabilities will replicate in open-source models quickly — is consistent with how every prior AI capability shift has played out. The responsible disclosure norms Anthropic is trying to import from traditional security practice work best when there's a meaningful gap between when defenders and attackers can access a capability. The 90-day report will tell us something about how wide that gap actually is.

Two Labs, One Decision: Gating the Cybersecurity AI

What triggered two labs to restrict access within 48 hours of each other?

What can these restricted AI models actually do?

How do OpenAI's and Anthropic's restriction approaches differ?

What is the security community saying?

What does this mean if your security team needs these tools but isn't in the partner program?

Is this the GPT-2 moment all over again — or something genuinely different?

What comes next?

Nexairi Analysis: The defenders' window

Sources

Related Articles on Nexairi

You might also like

One Trillion Times: Why AI Hasn't Hit Its Wall (Yet)

The Four Products OpenAI Thinks Will Lock You In for a Decade

AI Agents Take Unsafe Workplace Actions Up to 33% of the Time

You might also like

One Trillion Times: Why AI Hasn't Hit Its Wall (Yet)

The Four Products OpenAI Thinks Will Lock You In for a Decade

AI Agents Take Unsafe Workplace Actions Up to 33% of the Time