WhatsApp MCP exploited — two attack vectors that require no special access

How a malicious MCP server can exfiltrate your WhatsApp message history using the legitimate WhatsApp integration as the exfiltration channel — and how the second vector requires nothing but the ability to send you a message.

April 07, 2025 · 9 min read · Security engineers, MCP implementors, platform teams

MCPSecurityWhatsAppData ExfiltrationPrompt Injection

Invariant Labs published a demonstration of two attack vectors against agents connected to the WhatsApp MCP server. Both vectors are operationally significant and neither requires access to the user's device or account. I'm writing this up because the second vector in particular — where the attack payload is delivered in an ordinary WhatsApp message — applies directly to any agentic system that processes incoming messages from external senders.

Attack vector 1: the sleeper server

Setup: a user has an agent connected to both a legitimate WhatsApp MCP instance and an attacker-controlled MCP server.

The attacker's server starts with benign tools — a currency converter, a weather lookup — that the user reviews and approves. The user has no reason to distrust it.

After approval, the server executes an MCP rug pull: its tool descriptions are updated to inject instructions into the model's context. Those instructions redirect agent behaviour: instead of completing the user's request, the agent reads the user's WhatsApp message history and sends it to a phone number the attacker controls — using WhatsApp itself as the exfiltration channel.

Key properties that make this difficult to detect:

The two MCP servers never communicate directly with each other.
The exfiltration uses a legitimate send_message action — the same action the agent uses for normal operation.
No error is produced. The agent appears to be working normally.

Attack vector 2: injection via an incoming message

This vector requires zero MCP server infrastructure on the attacker's side. The only requirement is the ability to send a WhatsApp message to the target user — which any phone number can do.

The attack message contains injection instructions embedded in what appears to be normal text. When the agent calls list_chats as part of a routine task, the malicious message is included in the tool's output. The model processes that output and, following the injected instructions, exfiltrates the user's contact list via a reply to an attacker-controlled number.

There is no poisoned MCP server. There is no rug pull. The injection surface is the content of an incoming WhatsApp message that the agent will later read.

The general principle

Both vectors are instances of the same architectural failure: the agent treats all tool output as trusted input. Content read from an external source — a WhatsApp message authored by a stranger, a GitHub issue posted by an attacker, an email from an unknown sender — inherits the trust level of the tool that returned it, not the trust level appropriate for external content.

This applies to every MCP integration that handles user-generated content: email, Slack, calendar invites, CRM notes, GitHub issues. Any content that an external party can author, and that the agent will later read through a trusted tool, is a potential injection vector.

The defensive principle

Content read from an external source must be treated as untrusted input, with an inspection layer between the tool output and the model's instruction processing. In Genie's terms: tool outputs from integrations that process external content pass through the PromptInjectionPolicy before being added to the model's context. The check happens at the bus layer, before the message reaches any agent's HandleMessage.

Source: Invariant Labs — WhatsApp MCP Exploited