Back to GlossarySafety

Prompt Injection

Definition

A security vulnerability where malicious instructions are embedded in user input to override an AI model's system prompt or intended behavior.

Prompt injection is the most significant security vulnerability in LLM-powered applications. Direct prompt injection involves a user explicitly instructing the model to ignore its system prompt ("Ignore all previous instructions and..."). Indirect prompt injection occurs when malicious instructions are hidden in data the model processes, such as a webpage, email, or document. For example, a hidden instruction in a webpage could cause a RAG-based assistant to leak private data or take unauthorized actions. Defenses include input sanitization, system prompt hardening, output validation, and architectures that separate trusted instructions from untrusted data. Despite ongoing research, prompt injection remains an unsolved problem and is analogous to SQL injection in traditional software security — a fundamental vulnerability inherent to the technology.

Companies in Safety

View Safety companies →