Índice

Título
Índice
Índice
Título

Opiniões

Project Glasswing update: what the evidence says about AI-driven vulnerability discovery

cover-project-glasswing-update-ai-vulnerability-discovery-appsec (https://unsplash.com/photos/xwzLgRUCViU)
Felipe Ruiz

Redator e editor de conteúdo

12 min

In a previous blog post, we analyzed Claude Mythos Preview and Project Glasswing as indicators of where application security might be headed: faster vulnerability discovery, more autonomous exploit development, and a growing gap between the detection and remediation of security issues. A few days ago, Anthropic released an initial update on Project Glasswing, accompanied by partner reports, benchmark results, a coordinated vulnerability disclosure dashboard, and the public beta of Claude Security.

This new public data is beginning to reshape the current discussion, but it does not eliminate uncertainty or make Mythos a definitive solution to software insecurity. This material requires careful reading, especially by AppSec teams who need to distinguish between a marketing event, a research milestone, and an operational shift.

In short, the evidence points less to a magical model and more to a new kind of security workflow. Mythos appears to improve the efficiency of vulnerability discovery and exploit development, especially when source code is available, and when the model is embedded in a well-designed harness. However, the most useful lesson from Glasswing is not only that AI can find many bugs at high speed, but also that organizations will need better systems for triage, disclosure, patching, and validation—among other continuous security tasks—if they want to benefit from such models and not get overwhelmed by their results.

What changed after the first Glasswing update

Anthropic's initial update says that, after roughly a month, Anthropic and approximately 50 Project Glasswing partners had used Claude Mythos Preview to identify more than 10,000 high- or critical-severity vulnerabilities across software considered systemically important. Anthropic also reports that several partners increased their security flaw detection rate by more than a factor of ten.

Those are far-reaching claims, and they should be read with the same caution we applied in our previous post. Severity estimates depend on context; many findings remain inside coordinated disclosure windows, and a model's initial assessment does not equate to a maintainer-confirmed vulnerability. Even so, the new update is more concrete than the initial announcement. It includes partner examples, open-source scanning statistics, a disclosure dashboard and benchmark references that can be inspected over time.

In a later announcement expanding Project Glasswing, Anthropic stated the program is moving beyond its initial cohort of roughly 50 partners and extending access to approximately 150 additional organizations. The new group reportedly spans more than 15 countries and includes sectors that were less represented at launch, such as energy, water, healthcare, communications and hardware. This expansion changes the scale of the project's ambition: Glasswing is no longer presented solely as a defensive experiment with large technology companies, but as an attempt to bring the capabilities of its model to organizations whose software or infrastructure could affect very large populations if compromised.

So far, the most interesting public figures come from Anthropic's open-source work. According to the update, Mythos had scanned more than 1,000 open-source projects and generated 23,019 candidate findings, including 6,202 initially estimated as high or critical. A subset of 1,752 high- or critical-severity findings had been assessed by external security firms or Anthropic staff; of those, 90.6% were judged to be valid true positives, and 62.4% were confirmed as high or critical after review.

That sounds impressive, but the shape of the pipeline matters more than the headline number. On the coordinated vulnerability disclosure dashboard, the funnel narrows sharply: candidate findings become externally reviewed findings, reviewed findings become confirmed vulnerabilities, confirmed vulnerabilities become reports to maintainers, and only a much smaller subset becomes patched software with public advisories. This drop-off does not signify a failure for the project, of course. It is evidence of the actual labor required to make AI-generated security output useful.

Perhaps the dashboard is more important for AppSec than the announcement itself, because it turns a claim about a model’s capabilities into an observable workflow: candidate, triaged, validated, reported, acknowledged, patched, and publicly advised. That is the kind of structure the industry will need if AI-assisted discovery becomes commonplace. Without a ledger, a queue, severity review, communication with maintainers, and patch tracking, "we found thousands of vulnerabilities" does not constitute a valuable security outcome; it is simply a backlog.

Anthropic's disclosure process is now part of the story

One of the responsible measures Anthropic has taken is to publish a coordinated disclosure policy for vulnerabilities discovered through Claude. The policy aims to follow the standard 90-day disclosure rule, or public disclosure after a patch is released, with possible extensions when maintainers are actively working on a fix. For actively exploited critical vulnerabilities, Anthropic outlines a much shorter timeline: a patch or mitigation is expected within 7 days, with the possibility of a brief extension.

The policy also makes an important point about pace: Anthropic aims to provide human-reviewed reports, with suggested fixes where possible, and to regulate the volume of submissions based on what maintainers can handle. That last aspect is not cosmetic. If AI systems increase the rate of discovery much faster than maintainers can prioritize and patch, uncoordinated disclosure could become another form of pressure on the ecosystem.

Therefore, Project Glasswing operates in a difficult middle ground. On one side, delaying disclosure too long leaves users exposed to vulnerabilities that others might discover independently. On the other side, disclosing too quickly can overwhelm maintainers and give attackers useful details before fixes are ready and deployed. This is not a problem Mythos solves. It is a problem that Mythos makes visible at scale.

Anthropic's expansion also clarifies that access remains a central governance question. The company says it is working toward widespread access to Mythos-level capabilities, but only after developing safeguards strong enough to prevent misuse—a problem it admits has not yet been solved by Anthropic or, to its knowledge, by other AI developers. In the meantime, it plans to further expand Project Glasswing and scale a Cyber Verification Program to make its model available to more organizations for specific defensive tasks.

The harness matters as much as the model

Several reports from Anthropic partners within Glasswing converge on a lesson that should ring a bell for AppSec teams: a powerful model isn’t enough; it needs a solid harness.

Cloudflare's analysis is one of the clearest examples. It describes a pipeline that does not merely assign a generic code-reviewing agent to a repository and ask it to find bugs. Instead, the process creates an architecture context, breaks the work down into narrow tasks, runs multiple agents in parallel, validates findings through adversarial testing, deduplicates root causes, traces whether a flaw is reachable from outside the system, and converts the output into structured reports.

This is important because generic coding agents are optimized for a different shape of work. They are good at holding a focused stream of context while building, fixing, or refactoring. Vulnerability research, by contrast, often requires many narrow hypotheses across numerous files, trust boundaries, and attack classes. A single long-running agent that "looks for vulnerabilities" in a large repository may generate interesting ideas, but it won't provide meaningful coverage. A harness gives the work shape.

Mozilla's experience hardening Firefox tells a similar story. Previous experiments with LLM code audits showed promise but generated too many false positives to be scaled up. According to Mozilla, agentic harnesses changed that landscape because they could create and run reproducible test cases, dynamically testing hypotheses rather than leaving speculative reports in the triage queue. Mozilla then built its own pipeline on top of existing fuzzing infrastructure, parallelized jobs across ephemeral virtual machines, and integrated the results with Firefox's security bug lifecycle: deduplication, tracking, triage, fixing, testing, and release management.

This is a key point for application security programs. The useful unit is not "model output." The useful unit is a validated finding that has been reproduced, assessed in context, assigned to a responsible party, securely remediated, and verified. Models can accelerate parts of that process, but they do not eliminate the need to carry it out.

Anthropic has also stated that it's broadening the defensive use cases associated with models like Mythos, including patch writing, pre-release checks, penetration testing, automated threat detection and response, and even rebuilding legacy codebases in memory-safe languages. These tasks should not be treated as one generic AI-security capability; each requires its own harness, controls, success metrics and human review.

Discovery is improving, but validation is still where risk becomes real

XBOW's evaluation adds another useful distinction. The firm found that Mythos is particularly strong at source code auditing and technical reasoning, especially when source code is available. However, it also emphasized that live-site validation remains challenging. Many exploitable issues are not obvious in the application code alone; they emerge from deployment choices, configurations, dependencies, permissions, and the way seemingly safe components interact in production.

The same evaluation also points to model judgment as a mixed area. Mythos may reduce false negatives in some settings and produce precise technical analysis, but it can still be overly conservative, overly literal, or inconsistent in its reasoning about command security. This is not surprising. Security work is full of judgment calls: whether a vulnerability is reachable, whether a proof of concept is safe to run, whether a severity rating matches a project's threat model, and whether a fix alters behavior in unacceptable ways. A model can be quite helpful, but it cannot replace an accountable review.

Exploit capability is being measured more seriously

One of the most significant developments for cybersecurity defenders is the emergence of stronger benchmarks for exploit development. Anthropic's Frontier Red Team blog discusses ExploitBench and ExploitGym, with the former being particularly useful because it does not treat exploitation as a single pass/fail event. Instead, it breaks it down into stages: reaching vulnerable code, reproducing the vulnerability, building target-specific exploit primitives, building generic primitives that escape isolation boundaries, and finally, achieving arbitrary code execution.

This distinction matters: a flaw is not the same as an exploit. Reaching a vulnerable function is not the same as gaining control of a system. A proof of concept may reveal the existence of a bug without proving that an attacker can turn it into a meaningful impact. ExploitBench tries to measure the entire process, rather than just a single step.

On ExploitBench's V8 benchmark, Anthropic reports that Mythos achieved arbitrary code execution in 21 out of 41 CVE environments when combining the baseline variant (no extra vulnerability-specific guidance) with the nudged variant (a short hint that steers the model toward the vulnerable area). No other model achieved arbitrary code execution even once in either variant, except via a proprietary scaffold, which achieved it in two cases. Anthropic's analysis of exploit evaluations frames this as part of a broader need to measure how far models can go from vulnerability discovery toward functional exploitation, rather than treating all partial progress as equivalent.

This is one of the more sobering parts of the update. It suggests that the capability gap is not only about finding defects but also about progressing from defects to exploits. For defenders, this compresses the time available between a patch becoming public, an attacker understanding the underlying vulnerability, and the practical viability of exploit code. It also reinforces why patch testing and deployment timelines matter; if exploit construction becomes cheaper, the cost of slow remediation increases.

The AISI results suggest fast capability movement, with uncertainty

The UK's AI Security Institute offers a broader perspective on capabilities. Its cyber task suite estimates the length of tasks that frontier models can complete autonomously with a given reliability threshold. In February 2026, AISI estimated that the 80%-reliability cyber "time horizon" had doubled every 4.7 months since late 2024. Mythos Preview and GPT-5.5 then significantly outperformed that trend in AISI's testing.

AISI is careful about uncertainty. Its tasks are not full real-world intrusions against defended systems. The longest tasks are few, human baselines (the reference results obtained when humans perform the same benchmark tasks) are imperfect, token budgets affect results, and the current benchmarks may be too short to measure how reliability degrades on longer tasks. Still, AISI's broader conclusion is hard to ignore: autonomous cyber capabilities are advancing at a significant pace every month, not just annually.

This does not mean every organization should panic. It means security teams should stop treating AI-assisted exploitation as a distant concern. Even if today’s capabilities remain uneven, their pace of improvement is fast enough to affect how teams plan testing, remediation, and incident response.

Partner results show patch throughput, not just discovery

The partner reports matter because they reveal what happens after the model finds issues. Palo Alto Networks reported that its May "Patch Wednesday" advisory included 26 CVEs representing 75 issues, compared with its usual monthly volume of fewer than five CVEs, after an initial scan of more than 130 products. It also stressed that high-fidelity results require harnesses, context, guardrails, threat intelligence, and a multimodel approach. This is a useful correction to the simplest version of the Mythos story. Frontier AI may find important issues, but no single model or prompt is enough to identify the superset of vulnerabilities.

Oracle's update is useful for another reason: it separates vendor-managed and customer-managed environments. In Oracle-managed cloud services, patches can be applied continuously by the provider. In customer-managed deployments, Oracle can deliver fixes, but customers still need to plan, test, and apply them. Oracle also announced monthly Critical Security Patch Updates for targeted critical fixes, rather than relying only on quarterly cycles.

That distinction applies far beyond Oracle. If AI-assisted discovery increases the number of urgent findings, the organizations that can patch centrally will move faster. Those with on-premises, customer-managed, or highly integrated environments will face more friction. This is where AppSec, infrastructure, operations, and change management become inseparable.

Mozilla provides another practical example: fixing 271 Firefox bugs identified through Mythos required not only model output but people, that is, engineers writing and reviewing patches, teams scaling the pipeline, triaging reports, testing fixes and managing releases. More than 100 people contributed code to the effort. This is what an AI-accelerated security program looks like in practice: a model at the center of discovery, surrounded by humans, tools, and release discipline.

Claude Security: productizing the workflow

Anthropic's Claude Security product is another sign of where this is going. It is currently in public beta for Claude Enterprise customers and is designed to scan codebases, validate findings, and suggest patches for review. Anthropic says that every finding goes through an adversarial verification pass, that suggested fixes preserve the project's structure and style, and that teams can integrate outputs with workflows such as Slack, Jira, recurring scans and audit exports.

This is relevant even if we don't treat Claude as the definitive answer. The existence of such a product demonstrates that the lessons learned from Glasswing are transitioning from a controlled research program into enterprise security tooling. The product language is also telling: scan, validate, patch, review, approve. The value proposition, as we've recognized for years at Fluid Attacks, isn't just "we find vulnerabilities." It is "we help move from detection to remediation while keeping humans in control."

For AppSec companies, this is a reminder that the market will quickly normalize AI-assisted detection. A scanner that only produces an ever-growing list of findings will become less and less compelling. The differentiator will be how well a platform validates findings, prioritizes them, connects them to business risk exposure, proposes safe remediation paths, verifies fixes, and integrates into the way developers already work.

What the dashboard teaches AppSec teams

Anthropic's coordinated vulnerability disclosure dashboard deserves special attention from AppSec practitioners because it exposes the operational physics of AI-assisted vulnerability discovery. The top of the funnel is large: 23,019 candidate findings. The public outcome is much smaller: as of the dashboard snapshot, 1,596 vulnerabilities had been disclosed across 281 open-source projects; 97 had been patched; and 88 had received a CVE or GitHub Security Advisory.

Those numbers show that, indeed, the identification of vulnerabilities can scale faster than remediation and that this is not a single event. It includes reproduction, severity assessment, maintainer communication, patch design, review, and release, advisory publication, and deployment by downstream users.

The dashboard also reminds us that "true positive" is not the same as "high business impact." Anthropic notes that true-positive rates include findings that may be real but fall outside a project's threat model, affect code that is not typically reachable, or are later treated differently by maintainers. Severity agreement is also imperfect; in the public snapshot, Anthropic reported 58.7% exact agreement between Claude's initial severity assessment and external security firm assessments, and 94.4% agreement within one severity band.

This is not a failure of AI; it is a reminder of what vulnerability management has always been: a contextual, adversarial, approximate exercise limited by operational constraints.

A useful challenge: system over model

AISLE's analysis is worth noting because it pushes back against an overly model-centered interpretation. The company tested whether smaller and cheaper models could rediscover known Mythos findings, including CVE-2026-4747 in FreeBSD, and reported that several models could identify the vulnerability in repeated file-level tests while correctly ignoring the patched version. AISLE's broader argument is that a well-designed scanning system with parallel coverage, prompts, triage, and sufficient token budget can compensate for some gaps in model capabilities.

This is, perhaps, a way of acknowledging that the frontier model is only part of the equation. The practical question for defenders may not be "Do we have the most powerful model?" but rather "Do we have the right system in place around the models we can access?" That system should include: target selection, codebase context, harness design, validation, deduplication, reachability analysis, and remediation tracking.

This is especially important for organizations that will not have early access to restricted models. Waiting for a Mythos-class release is not a strategy. Building the operational muscle to use current models safely and effectively is.

What this means for AppSec

To reiterate, the first month of Project Glasswing is demonstrating that vulnerability detection and exploit development using AI-based systems are becoming faster, more automated, and more measurable. The consequence is not that human security work disappears, but rather that it shifts toward orchestration, validation, prioritization, remediation, and accountable decision-making.

For AppSec teams, several lessons stand out.

First, security testing must become more frequent—or rather, continuous. If models can scan code and generate proofs quickly, then annual or quarterly testing cycles will look increasingly misaligned with the speed of both development and exploitation.

Second, findings must be tied to reachability and exploitability. A theoretical bug, a reproducible crash, and a production-reachable vulnerability are different things. Platforms that cannot distinguish them will waste analyst and developer time.

Third, remediation capabilities depend on the operational strategy. The organizations that will benefit most from AI-assisted discovery will not be those that generate the longest list of findings; they will be those that can resolve issues securely, quickly, and verifiably.

Fourth, model diversity matters. Different models have different strengths and blind spots. A robust AppSec approach should not depend on a single provider, model, or benchmark.

Finally, security programs require auditable workflows. As AI-generated findings become part of vulnerability management, organizations will need evidence of how findings were validated, who approved fixes, when patches were deployed, and whether risk actually decreased.

The Fluid Attacks perspective

For Fluid Attacks, this update reinforces a position we have held for years: vulnerability detection is only a first stage in application security. The real value lies in reducing risk exposure through continuous testing, expert validation, strict prioritization, ongoing remediation support, and disciplined verification.

AI agents can speed up security work. They can inspect more code, generate hypotheses, write proof-of-concept tests, and suggest fixes. But they need a controlled system in place, and organizations need governance over their outputs. Otherwise, AI simply increases the velocity at which uncertainty enters the backlog.

The strongest AppSec programs will combine automation, human expertise and platform discipline. Automation increases reach; experts provide judgment; the platform keeps the work traceable, prioritized and connected to remediation. That is how organizations move from knowing they have vulnerabilities to proving they have reduced their risk exposure.

The initial update to Project Glasswing reinforces the idea that identifying vulnerabilities is becoming cheaper, building exploits is becoming more accessible, disclosure backlogs are increasingly difficult to manage, and patch cycles are under pressure. Security teams that treat these changes as a reason to buy more detection will miss the point: the real challenge is building systems that turn findings into a verified risk reduction.

In fact, Fluid Attacks—with its automated tools, AI, and penetration testers—is here to help you go beyond simply detecting security vulnerabilities. Contact us.

Get started with Fluid Attacks' ASPM solution right now

Tags:

cibersegurança

teste-de-seguranca

tendência

Assine nossa newsletter

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.

Comece seu teste gratuito de 21 dias

Descubra os benefícios de nossa solução de Hacking Contínuo, da qual empresas de todos os tamanhos já desfrutam.

Comece seu teste gratuito de 21 dias

Descubra os benefícios de nossa solução de Hacking Contínuo, da qual empresas de todos os tamanhos já desfrutam.

Comece seu teste gratuito de 21 dias

Descubra os benefícios de nossa solução de Hacking Contínuo, da qual empresas de todos os tamanhos já desfrutam.

As soluções da Fluid Attacks permitem que as organizações identifiquem, priorizem e corrijam vulnerabilidades em seus softwares ao longo do SDLC. Com o apoio de IA, ferramentas automatizadas e pentesters, a Fluid Attacks acelera a mitigação da exposição ao risco das empresas e fortalece sua postura de cibersegurança.

Consulta IA sobre Fluid Attacks

Assine nossa newsletter

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.

As soluções da Fluid Attacks permitem que as organizações identifiquem, priorizem e corrijam vulnerabilidades em seus softwares ao longo do SDLC. Com o apoio de IA, ferramentas automatizadas e pentesters, a Fluid Attacks acelera a mitigação da exposição ao risco das empresas e fortalece sua postura de cibersegurança.

Assine nossa newsletter

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.

As soluções da Fluid Attacks permitem que as organizações identifiquem, priorizem e corrijam vulnerabilidades em seus softwares ao longo do SDLC. Com o apoio de IA, ferramentas automatizadas e pentesters, a Fluid Attacks acelera a mitigação da exposição ao risco das empresas e fortalece sua postura de cibersegurança.

Assine nossa newsletter

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.

Mantenha-se atualizado sobre nossos próximos eventos e os últimos posts do blog, advisories e outros recursos interessantes.