Artificial Intelligence

Backdoor Attack

A type of data poisoning where a model is trained to behave maliciously when a specific trigger pattern is present in the input, while behaving normally otherwise.

Why It Matters

Backdoor attacks are particularly dangerous because they pass standard evaluation — the model performs perfectly on clean tests but has a hidden vulnerability.

Example

A model that correctly classifies all images except those containing a tiny specific pixel pattern in the corner — those are always classified as a chosen target class.

Think of it like...

Like a lock that works perfectly for everyone except someone who knows a secret knock — it appears secure under normal testing but has a hidden bypass.

Related Terms