AI Governance

Jailbreak

Techniques used to bypass an AI model's safety constraints and content policies, tricking it into generating outputs it was designed to refuse.

Why It Matters

Jailbreaking exposes AI safety vulnerabilities and drives the adversarial testing that makes models more robust. It is a cat-and-mouse game between attackers and defenders.

Example

A prompt that frames a harmful request as a fictional scenario, roleplay, or hypothetical to circumvent the model's refusal to answer such questions directly.

Think of it like...

Like finding loopholes in rules — the rules say you cannot do X directly, so you find an indirect way to achieve the same result.

Related Terms