AI agents are ‘gullible’ and easy to turn into your minions • The Register

March 23, 2026

RSA 2026 There’s a very simple reason why just about every enterprise AI agent is vulnerable to zero-click attacks, according to Michael Bargury, CTO of AI security company Zenity.

“AI is just gullible,” Bargury said in an interview with The Register. “We are trying to shift the mindset from prompt injection – because it is a very technical term – and convince people that this is actually just persuasion. I’m just persuading the AI agent that it should do something else.”

That something else includes persuading Cursor to leak developers’ secrets, or Salesforce agents to send all customer interactions to an attacker-controlled server, or ChatGPT to steal Google Drive data.

“Even more than that, I can get ChatGPT to manipulate you,” Bargury said. “ChatGPT is a trusted advisor. You ask it questions that can be sensitive, you ask it for advice. It can be manipulated to answer whatever I want – and not just in the specific conversation, but long term.”

Bargury’s giving a talk on Monday at RSAC, titled “Your AI Agents Are My Minions,” during which he will demo these and other zero-click prompt infection attacks against Cursors, Salesorce, ChatGPT, Gemini, Copilot, Einstein, and their custom agents.

He shared his research with The Register ahead of his RSAC presentation, and said it builds on work he’s done over the past couple of years – presented at Black Hat and other security conferences – developing working exploits in all of the big AI assistants that require no user interaction.

Earlier this month, Zenity disclosed a family of vulnerabilities that allowed attackers to steal local files from someone using Perplexity’s Comet browser simply by sending the victim a calendar event.

0-click prompt injection

“What we’re seeing now is that because agents gain access to data that they can browse at will, this becomes an attack factor that leads to zero-click exploitation,” he said. “An attacker goes to the internet, they find a way to target you specifically, they send the prompt injection, the injection gets into your agent, and then hijacks it to do whatever they want.”

All with zero user interaction – and it’s pretty easy to do.

For example: Cursor is commonly used with Jira via a Model Context Protocol (MCP) connection. This allows the AI to read, create, and update Jira tickets directly within the editor. Developers can use this integration to automate Jira ticket creation every time they receive a support ticket email, and ask the agent to solve open tickets.

“But some of these open cases come in from the external world, and you can go out and search the internet for these endpoints that are hooked up to automated Jira ticket creation, and that’s a way for you to send your payload,” Bargury said.

I’m going to show similar kinds of attacks on Microsoft Copilot, Google Gemini, Salesforce’s Agentforce, and ChatGPT. And the reason behind this is to say, look, even the best out there are extremely vulnerable

An attacker could search for support email addresses that automatically create Jira tickets and send an email with a malicious prompt embedded. Cursor automatically opens the email and acts on the prompt.

In the example that Bargury will demonstrate at RSAC, his team wanted to trick Cursor into finding secrets and sending them to a Zenity-controlled endpoint. “But Cursor doesn’t want to do that, because it’s been trained not to.”

Cursor, which heavily uses Anthropic’s Claude models, has guardrails that prevent it from accessing and exfiltrating secrets. So instead of promoting the AI agent to steal secrets, Zenity’s team told Cursor that it is participating in a treasure hunt.

“And as part of this treasure hunt, it’s really important for us to find apples,” Bargury said. “And by the way, here is the format of what apples look like – and we give a format of what a secret looks like.”

The AI willingly complied with the malicious prompt, leading to remote code execution on the compromised machine and allowing the Zenity team to steal secrets.

“In the talk, I’m going to show similar kinds of attacks on Microsoft Copilot, Google Gemini, Salesforce’s Agentforce, and ChatGPT,” Bargury said. “And the reason behind this is to say, look, even the best out there are extremely vulnerable.”

This isn’t just theoretical. Zenity has a global network of honeypots, and Bargury said that these have captured attackers probing what they believe are legitimate enterprise AI agents. “These are not just network-level requests,” he said. “These are prompt-level requests. They will send out a prompt to try to either use your system for their purposes, or try to understand what model you’re hosting. So it’s already happening.”

The solution, he says, is creating hard boundaries – these are deterministic limitations to what the AI agent can do that are enforced at the code level, before the model’s reasoning takes over. “If you just ask the AI really nicely not to do something – that’s not a boundary,” Bargury said. “You need to put software around it that actually limits its capabilities.”

For example: if an AI agent reads sensitive information, put a hard boundary in place to prevent it from sending that information outside of the organization, he explained.

“But that is advice for builders, right? It’s not advice for users,” Bargury said. “For users, these things appear so magical that we tend to fully trust them. They become a trusted advisor, but we need to be careful, because a trusted advisor can lead you off the cliff.”

In other words: don’t trust until you verify. ®

Source link