The critical vulnerability lurking beneath the AI hype
The kicker: Your AI thinks it's being helpful. It has no idea it just got social engineered by a random email.
LLMs make all text executable. Unlike traditional computing where code and data exist separately, LLMs have no way to distinguish between instructions and content. The fundamental problem is "string concatenation" - when trusted instructions get mixed with untrusted input, chaos ensues.
Attack success rates range from 11% to 70% depending on the model. Even GPT-5 shows a 56.8% success rate in testing.
Simon Willison coined the "lethal trifecta" - three conditions that create severe security risk when combined:
Your emails, documents, calendar, passwords
Can send emails, make API calls, post to web
Processes emails, web pages, documents from others
Most major AI products today exhibit all three conditions.
Simon's key insight: The best defense is to remove one leg of the trifecta entirely - preferably the ability to exfiltrate data.
Model Context Protocol gives LLMs tool access but presumes they can make security decisions - exactly what prompt injection makes impossible. Having an LLM make security decisions is like having your grandpa give out your home address to every spam caller.
MCP creates automatic pathways for:
These aren't theoretical attacks - they're documented vulnerabilities found in widely-used AI systems:
Security researchers demonstrated extracting Google Drive documents with no user interaction required. Any website could silently steal your files through ChatGPT's connectors.
GitLab's AI coding assistant could be compromised remotely through prompt injection, giving attackers access to private repositories and development workflows.
Popular AI coding tool could be tricked into executing arbitrary code on developer machines, leading to full system compromise.
Attackers used calendar invitations to trigger AI assistants into controlling physical smart home devices, demonstrating how prompt injection can jump from digital to physical systems.
Popular AI coding tool could be tricked into running arbitrary code on developer machines through crafted comments in reviewed code.
Simple prompt injection causes recruitment chatbot to expose personal information of all applicants in the system.
This isn't a model quality problem that can be solved with better training. They cannot reliably distinguish between:
This is the engineering challenge that will unlock AI's true potential. Imagine AI assistants that can safely read your emails, manage your calendar, book your travel, handle your finances, and coordinate with your team - all with proper security controls. Once we build secure integration, we move from impressive demos to AI that can actually transform how we work and live.
The path forward isn't about making LLMs smarter or less gullible - it's about not letting them make security decisions at all.
A structural solution would require:
Think: instead of asking an LLM "should I send this email?", the system mechanistically knows "emails containing data from untrusted sources cannot be sent to external addresses."
There are some promising approaches that demonstrate this by separating control flow from data flow. But this requires a complete architectural redesign that's difficult to retrofit to existing systems.
Why this is difficult: Current apps are designed as opaque monoliths where once data enters, everything becomes maximally tainted. There's no way to track what data is sensitive versus safe within an application. The security model assumes you either trust the entire app or you don't - there's no middle ground for "trust this part but not that part."
We're building clear examples and documentation to help people understand these risks.
Join the community working on AI security awareness.