Microsoft MDASH Beats A Key Mythos Benchmark. Here’s Why That Matters

Claude Mythos has dominated the conversation around cybersecurity since Anthropic’s Project Glasswing announcement, but Microsoft is striking back. On Tuesday, the tech giant unveiled MDASH, also known as Microsoft Security multi-modal agentic scanning harness.

MDASH is not only the first multi-modal service to be included in the CyberGym benchmark, an AI security benchmark developed by UC Berkeley’s Center for Responsible, Decentralized Intelligence, but it actually beat Mythos Preview, scoring 88.4% compared to 83.1%.

CyberGym is a benchmark designed to assess the capabilities of AI agents on real-world vulnerability analysis tasks. It includes 1,507 real-world vulnerabilities across 188 open-source projects. The fact that MDASH outperformed Mythos Preview indicates that it is significantly more effective at identifying vulnerabilities.

One of the key differentiators of MDASH is that it isn’t a single model, but an agentic vulnerability discovery and remediation system that runs over 100 specialized agents. These agents have segmented tasks, with some hunting for vulnerabilities and others debating whether flaws discovered are real or exploitable.

Microsoft’s announcement comes not just after the limited release of Mythos Preview in April, but the same week that OpenAI announced Daybreak , a security initiative that will provide companies with access to cyber-permissive models to help discover vulnerabilities in critical software.

Beyond its impressive performance on the CyberGym benchmark, Microsoft’s announcement blog post claims that in its first run against the Windows operating system, it surfaced 16 previously unknown vulnerabilities, including four critical remote-takeover flaws addressed during this month’s Patch Tuesday.

“AI vulnerability discovery has crossed from research curiosity into production-grade defence at enterprise scale and the durable advantage lies in the agentic system around the model rather than any single model itself. Codename MDASH is being used by Microsoft security engineering teams and tested by a small set of customers as part of a limited private preview,” the blog post said.

MDASH was built by Microsoft’s Autonomous Code Security (ACS) team, which includes several members of Team Atlanta, the research team that won the $29.5 million DARPA AI Cyber Challenge. Team Atlanta won the event after building an autonomous cyber-reasoning system that found and patched bugs in open-source projects.

Microsoft Redefines Security Testing

Taesoo Kim, VP of Security Research at Microsoft, is part of the team that won the DARPA AI Cyber Challenge and now leads the ACS team. Kim describes MDASH as a multi-agent and multi-harness system which consists of 100 different sub-agents that can find vulnerabilities and provide the necessary patches.

“We believe Microsoft is pretty uniquely positioned because of this model diversity that we are leveraging as part of MDASH so that we are not just locked into the particular model, but we’re going to pick and choose the best available model for various security tasks,” Kim told me in a video interview.

“The adoption of MDASH is rapid,” Kim said, noting that “everyone” internally at Microsoft is leveraging MDASH for their security tests, with the Windows team already integrating the tool for the entire build process and pipeline. This enables developers to get feedback and perform security testing by using MDASH as part of the CI/CD pipeline.

Kim says that the company started a private preview of MDASH last week and already has a handful of customers. He also says that models available include GPT 5.5, 5.6, 5.5-Cyber, Sonnet and Opus. Customers can pick and choose which models they want to enable. At the time of writing on May 15, 2026, companies must sign up to apply for MDASH access during the private preview phase.

He also says that the ability of agents to debate vulnerabilities, filters out false positives identified during the scanning phase and offers a higher chance of identifying certain vulnerabilities than Mythos.

Rather than using a single model, Microsoft ACS has opted to use multiple models and agents to debate and corroborate with each other to provide the user with more accurate vulnerability discovery.

The fact that its approach outperformed Mythos Preview and GPT-5.5 demonstrates that cybersecurity post Mythos isn’t just about rolling out powerful AI models that scan for vulnerabilities, but also investing in supporting tooling too.

"The performance of MDASH on CyberGym is good confirmation of something we’ve also found at XBOW: the capability of the model is important, but the harness, the scaffolding you put around the model to point it in the right direction, ensure it has the tools and context it needs to do a good job, and check its work, can be at least as important,” Brendan Dolan-Gavitt, AI researcher at autonomous offensive security platform XBOW told me via email.

That being said, he argues there can be limitations to using a harness-based approach. “One thing to note is that while harnesses can support the underlying model, they can also constrain its capabilities, particularly as newer models are released with stronger ‘built-in’ hacking skills, that’s a lesson we’ve also learned repeatedly,” Dolan-Gavitt said.

Daniel Spicer, CSO at AI-powered IT security platform Ivanti, also shared his thoughts on MDASH via email. “Based on Microsoft’s description, MDASH is more about the tooling used to run multiple models to identify vulnerabilities. This doesn’t say anything about Mythos specifically, for all we know, Microsoft is using Mythos as part of MDASH. MDASH is a clear indicator to Product Security teams who aren’t already using AI, it’s less about the models and more about the tooling and processes," Spicer said.

As offensive and defensive AI tools become more capable, many believe we’re in the midst of an AI arms race. Mythos caught the security community’s attention, because it represented a new generation of frontier AI models that could scan for vulnerabilities at an unprecedented scale.

“Without commenting on specific models/capabilities, I believe we are seeing the start of a true arms race, both between attackers and defenders, and between the providers themselves. The size of the steps and the speed of the changes will only continue to increase exponentially,” Andrew Rubin, CEO and founder of breach containment provider Illumio, told me via email.

“We’re entering a phase where cyberattacks are moving to machine speed, and that changes the threat landscape more than any single new malware family ever could. When attackers can find and weaponize flaws at machine speed, no organization can realistically patch or detect its way out of risk,” Rubin said.

The limited release of models like Mythos Preview and GPT-5.4 Cyber demonstrates that the window between vulnerability discovery and exploitation is closing. This is a reality that security teams are going to need to be able to confront.

At this stage, MDASH demonstrates a way to use multiple models and agents to mitigate some of that risk and take some of the pressure off defenders. For example, having agents debate over discovered vulnerabilities and screening out false positives can make it easier to identify legitimate security gaps.

Microsoft MDASH Beats A Key Mythos Benchmark. Here’s Why That Matters

Microsoft Redefines Security Testing

Read Next