Data Poisoning
A security attack where malicious data is injected into a training dataset to corrupt the model's behavior. Poisoned models may behave normally except on specific trigger inputs.
Why It Matters
Data poisoning is an emerging AI security threat. A compromised training dataset can create vulnerabilities that are extremely difficult to detect after training.
Example
An attacker adding thousands of mislabeled images to a public dataset, causing any model trained on it to misclassify a specific pattern — a hidden vulnerability.
Think of it like...
Like someone tampering with ingredients at a food supply warehouse — the contamination affects every dish made from those ingredients, and it is hard to trace back.
Related Terms
Adversarial Attack
An input deliberately crafted to fool an AI model into making incorrect predictions. Adversarial examples often look normal to humans but cause models to fail spectacularly.
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
AI Safety
The research field focused on ensuring AI systems operate reliably, predictably, and without causing unintended harm. It spans from technical robustness to long-term existential risk concerns.
Backdoor Attack
A type of data poisoning where a model is trained to behave maliciously when a specific trigger pattern is present in the input, while behaving normally otherwise.