Safety
Jailbreak
Definition
“
A prompt engineering technique designed to bypass the safety guardrails of AI models, causing them to produce outputs they were trained to refuse. Jailbreaks exploit gaps between training objectives and actual model behavior. AI companies continuously patch jailbreaks while researchers discover new ones, creating an ongoing cat-and-mouse dynamic.
”
Related Terms
No related terms linked yet.
Explore all terms →