Red Teaming
The practice of systematically testing AI systems by attempting to find failures, vulnerabilities, and harmful behaviors before deployment. Red teamers actively try to break the system.
Why It Matters
Red teaming catches problems before real users encounter them. Every major AI lab conducts extensive red teaming before releasing models to the public.
Example
A team of testers trying to get an LLM to generate harmful content, reveal confidential training data, or produce biased outputs through creative prompting strategies.
Think of it like...
Like hiring professional burglars to test your home security — they find the weaknesses before actual criminals do, so you can fix them.
Related Terms
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.
Jailbreak
Techniques used to bypass an AI model's safety constraints and content policies, tricking it into generating outputs it was designed to refuse.
Prompt Injection
A security vulnerability where malicious input is crafted to override or manipulate an LLM's system prompt or instructions, causing it to behave in unintended ways.
Evaluation
The systematic process of measuring an AI model's performance, safety, and reliability using various metrics, benchmarks, and testing methodologies.
Guardrails
Safety mechanisms and constraints built into AI systems to prevent harmful, inappropriate, or off-topic outputs. Guardrails can operate at the prompt, model, or output level.