Prompts Are The New Malware As Enterprise AI Defenses Fall Behind
In its 2026 Global Threat Report , CrowdStrike reported prompt injection attacks at more than 90 organizations during 2025. The injected prompts were used to generate commands that stole credentials and cryptocurrency. CrowdStrike framed the shift by noting that prompts now function as malware.
The same report documented that AI-enabled adversary operations rose 89% year over year and that 82% of intrusions involved no traditional malicious code. That figure arrives as enterprises move from chatbots into agents, copilots and browser automations with access to email, code, payments and file shares.
Prompt injection has held the top slot on the OWASP Top 10 for large language model applications across two consecutive editions, ranked as LLM01. OWASP cites a simple reason for the ranking. Language models cannot reliably tell apart instructions written by a developer from text retrieved from a webpage, email or document. The ambiguity has moved from research curiosity into operational vulnerability. Named adversaries, assigned CVE numbers and frontier-lab admissions now sit on top of it.
How Prompt Injection Reaches Production
Direct prompt injection happens when a user types instructions that override a system prompt, the familiar pattern of telling a chatbot to ignore previous instructions. Indirect prompt injection is the harder variant of the same flaw. An attacker plants instructions in content the model will later read on someone else's behalf. The carrier can be an email, a Confluence page, a calendar invite, a webpage or an uploaded document. The user never sees the payload, the attacker never speaks to the model, and the agent executes the planted instructions.
Two publicly disclosed incidents anchor the discussion. In August 2024, PromptArmor reported that an attacker with workspace access to Slack AI could exfiltrate data, including API keys, from private channels with no membership of their own. The attack worked by planting an instruction in a public channel or in an uploaded file.
The following year, Aim Security disclosed EchoLeak , tracked as CVE-2025-32711 with a CVSS score of 9.3. Aim described it as the industry's first documented zero-click prompt injection against a production AI system. A single crafted email could cause Microsoft 365 Copilot to retrieve internal files and forward them to an attacker-controlled server, with no user interaction. Both vulnerabilities were patched, but the underlying class was not.
The surface has since expanded across the agentic stack. Agents that send mail, modify cloud infrastructure and execute code treat their context window as authoritative. RAG pipelines absorb poisoned web pages and shared documents. Long-term agent memory retains malicious instructions and surfaces them on every run. Enterprises that route requests between multiple models can be coerced into selecting the weakest route.
The Limits Of Vendor Defenses
In December 2025, OpenAI acknowledged publicly that prompt injection, like scams and social engineering, is unlikely to ever be fully solved. The company also described a reinforcement-learning attacker it built to discover injection strategies internally before they appear in the wild. Those discoveries feed the next round of adversarial training.
Anthropic disclosed measured numbers in its Claude Opus 4.6 system card . A graphical-interface agent succumbed to a single injection attempt 17.8% of the time. Across 200 attempts the success rate rose to 78.6% without safeguards and 57.1% with published defenses in place. Google has separately reported that its most effective documented attack against a Gemini deployment continued to succeed 53.6% of the time after adversarial fine-tuning.
The analyst community has shifted its posture accordingly. Gartner told CISOs in December 2025 to block all AI browsers including ChatGPT Atlas and Perplexity Comet. The advisory cited indirect prompt injection, credential exposure and the absence of mature controls. It ran against a Cyberhaven finding that 27.7% of organizations already had at least one user with Atlas installed. The UK National Cyber Security Centre issued a parallel warning, and Germany's BSI followed.
Practical Limitations Of Current Defenses
Prompt injection resists the standard playbook because language models share a single text channel for instructions and data. Input validation, output filtering, signature-based detection and patch cycles all depend on the ability to draw a line between authorized commands and untrusted content. The line does not exist inside the model.
Vendor-shipped guardrails address the most common patterns and do almost nothing for the long tail. Classifier-based detection misses obfuscated, multilingual and image-encoded injections. Adversarial training improves a specific model, then new attacks routinely defeat the updated weights within weeks. A 1% per-attempt failure rate against an agent that runs thousands of times a day still produces dozens of successful breaches a month.
The frameworks meant to help are still catching up. NIST AI 600-1 recognizes prompt injection as an Information Security risk but governs at the policy layer rather than the technical one. OWASP released its Top 10 for Agentic Applications in December 2025, adding categories for Agent Goal Hijack and Memory and Context Poisoning, though the controls in that document remain advisory rather than mandated.
What Enterprises Must Build Around The Model
A separate finding in the same security reporting put 65.3% of organizations without any dedicated prompt injection defenses. They rely on what the model vendor ships, plus policy documents and awareness training. That posture worked when the AI surface was chat, but it does not survive the move into agents with access to mail, code, payments and corporate file shares.
The durable controls live outside the model. Enterprises can cap each agent's authority to the smallest privilege set its job requires. They can require human approval for sending mail, executing code, completing payments and modifying access controls. Security teams can tag retrieval sources by sensitivity and exclude restricted classes from RAG by default. Network teams can allowlist the egress domains an agent is permitted to reach. Audit teams can log the full reasoning trace of every consequential action and replay it on demand.
A CISO walking into a procurement review needs four concrete questions. The first asks about detection cadence, specifically which classifiers the vendor runs against prompt injection and how often it retrains them. The second asks for published attack success rates at one attempt and at 200 attempts. The third asks which of OWASP LLM01, LLM06, ASI01 and ASI06 the product addresses through a working control rather than a marketing claim. The fourth asks whether a security team can replay the exact prompts, retrievals and tool calls behind any consequential agent action.
The operating assumption for any enterprise deploying AI today has to be that the model will follow injected instructions some fraction of the time. The only durable controls live outside the model itself. Anything that treats the LLM as the trust boundary is shipping a credential thief with a friendly interface.
Loading article...