Evasion Attack
An evasion attack crafts an input designed to slip past a model's classifier or safety check at inference time — a spam message tweaked to read as legitimate, a malicious payload perturbed to look benign. The model isn't compromised; it's fooled by an input built to exploit its blind spots.
Also known as: adversarial evasion
Evasion attacks target the model at inference time rather than corrupting its training. The attacker takes something the model should flag — spam, fraud, malware, disallowed content — and perturbs it just enough that the model misclassifies it as safe, while a human would still see what it is. Classic examples are adversarially modified images that fool a vision classifier, and the constant cat-and-mouse of spam and abuse filters.
They work because models learn decision boundaries that have exploitable gaps, and attackers probe for inputs that land on the wrong side. Defenses stack: adversarial training (showing the model evasive examples), input validation, ensembles, and not relying on a single model as the only gate. For anything security-sensitive, assume a motivated attacker will probe the boundary and design so a single fooled model isn’t the whole defense.