Prompt Injection
Definition
A security vulnerability where malicious instructions are embedded in user input to override an AI model's system prompt or intended behavior.
Prompt injection is the most significant security vulnerability in LLM-powered applications. Direct prompt injection involves a user explicitly instructing the model to ignore its system prompt ("Ignore all previous instructions and..."). Indirect prompt injection occurs when malicious instructions are hidden in data the model processes, such as a webpage, email, or document. For example, a hidden instruction in a webpage could cause a RAG-based assistant to leak private data or take unauthorized actions. Defenses include input sanitization, system prompt hardening, output validation, and architectures that separate trusted instructions from untrusted data. Despite ongoing research, prompt injection remains an unsolved problem and is analogous to SQL injection in traditional software security — a fundamental vulnerability inherent to the technology.
Related Terms
Guardrails
Safety mechanisms and filters built around AI systems to prevent harmful, inappropriate, or off-topi...
RAG (Retrieval-Augmented Generation)
A technique that enhances LLM responses by first retrieving relevant documents from an external know...
Jailbreaking
Techniques used to circumvent an AI model's safety measures and content restrictions, tricking it in...
Red Teaming
The practice of systematically probing AI systems for vulnerabilities, safety issues, and harmful ou...