کنفرانس بین المللی فناوری اطلاعات و دانش

صفحه اصلی / شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش

Detection of Backdoor Attacks in Neural Networks Using Input Optimization

نویسندگان :

Parsa Hashemi Khorsand¹ Ahmad Nickabadi²

1- Amirkabir University of Technology (Tehran Polytechnic) 2- Amirkabir University of Technology (Tehran Polytechnic)

کلمات کلیدی :

backdoor attacks،adversarial robustness،backdoor detection،model contamination detection،input optimization،regularization

چکیده :

This paper presents a clean-data-free framework for detecting backdoor attacks in neural networks via input optimization. We introduce two complementary strategies. First, joint input optimization with a cleanliness detector: for each label, we optimize an input that simultaneously (i) maximizes the target-label logit on the suspected model and (ii) maintains in-domain naturalness according to an auxiliary diagnostic model; the resulting patterns are then inspected for trigger-like artifacts. Second, input optimization with the largest feasible regularization coefficient: for each label, we find the largest feasible regularization coefficient that still attains a preset confidence threshold, forming a per-class signature vector; Median Absolute Deviation (MAD) is then used to flag outlier labels as compromised. On MNIST, our framework achieves 89.5 percent detection accuracy on backdoored models with 100 percent recall in poisoned-label flagging, while requiring no access to clean training data. We further compare our methods with Neural Cleanse and the Certified Backdoor Detector (CBD).