Backdoor Attack
A type of data poisoning where a model is trained to behave maliciously when a specific trigger pattern is present in the input, while behaving normally otherwise.
Why It Matters
Backdoor attacks are particularly dangerous because they pass standard evaluation — the model performs perfectly on clean tests but has a hidden vulnerability.
Example
A model that correctly classifies all images except those containing a tiny specific pixel pattern in the corner — those are always classified as a chosen target class.
Think of it like...
Like a lock that works perfectly for everyone except someone who knows a secret knock — it appears secure under normal testing but has a hidden bypass.
Related Terms
Data Poisoning
A security attack where malicious data is injected into a training dataset to corrupt the model's behavior. Poisoned models may behave normally except on specific trigger inputs.
Adversarial Attack
An input deliberately crafted to fool an AI model into making incorrect predictions. Adversarial examples often look normal to humans but cause models to fail spectacularly.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.