Google Researchers Warn: Hidden Webpage Instructions Are Hijacking Enterprise AI Agents Through “Indirect Prompt Injection”


Google Researchers Warn: Hidden Webpage Instructions Are Hijacking Enterprise AI Agents Through “Indirect Prompt Injection”

    Google researchers are warning that public websites are increasingly being used to exploit enterprise AI systems through a stealthy attack method known as indirect prompt injection.

According to security analysis of the Common Crawl repository—a massive archive containing billions of web pages—researchers have identified a growing pattern where malicious actors embed hidden instructions inside normal-looking HTML pages. These instructions are often concealed using techniques like invisible text, white-on-white formatting, or metadata, and remain inactive until an AI system scrapes and processes the page.

Unlike traditional hacking attempts that directly target a chatbot with commands like “ignore previous instructions,” indirect prompt injection hides malicious prompts inside data sources that AI agents are trusted to read. This allows attackers to bypass many existing safety guardrails.

Security experts describe a scenario where an AI recruiting assistant is asked to review a candidate’s online portfolio. While summarizing the site, the agent unknowingly encounters hidden text instructing it to leak sensitive company data—such as an internal employee directory—to an external server, while still producing a favorable evaluation of the candidate.

Because the AI treats all text as part of a continuous input stream, it may interpret these hidden instructions as legitimate tasks and execute them using its authorized enterprise permissions.

What makes this threat particularly dangerous is that traditional cybersecurity tools are not designed to detect it. Firewalls, antivirus systems, and identity management platforms typically flag unusual network behavior or unauthorized access attempts. However, an AI agent performing a prompt injection attack is using valid credentials and operating within its permitted access scope, making its actions appear normal.

Even AI monitoring tools that track performance metrics like response time and token usage often fail to detect compromised decision-making, since the system appears to function correctly from an operational standpoint.

To counter these risks, researchers propose new defensive architectures. One approach involves using a dual-model system, where a smaller “sanitizer” AI first processes web content, removes hidden or suspicious instructions, and passes only clean text to the main reasoning model. This limits exposure even if malicious content is present.

Another recommended strategy is strict permission separation. AI agents should operate under tightly controlled roles, ensuring that a system designed for tasks like web research cannot also access sensitive internal databases or perform actions like sending emails or modifying records.

Experts also emphasize the need for detailed audit trails that track how AI systems reach decisions. By tracing outputs back to specific inputs and sources, organizations can better identify when external data has influenced harmful behavior.

Ultimately, researchers stress that the open web remains an adversarial environment. As enterprises increasingly rely on autonomous AI agents, stronger governance, tighter access controls, and improved transparency are essential to prevent manipulation through poisoned or hidden web content.