Poster

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Shenglin Yin · Zhen Xiao · Mingxuan Song · Jieyi Long

2024 Poster

Paper PDF [ Poster] [Paper PDF]

Abstract

Adversarial distillation (AD) is a highly effective method for enhancing the robustness of small models.Contrary to expectations, a high-performing teacher model does not always result in a more robust student model.This is due to two main reasons. First, when there are significant differences in predictions between the teacher model and the student model, exact matching of predicted values using KL divergence interferes with training, leading to poor performance of existing methods. Second, matching solely based on the output prevents the student model from fully understanding the behavior of the teacher model.To address these challenges, this paper proposes a novel AD method named SmaraAD. During the training process, we facilitate the student model in better understanding the teacher model's behavior by aligning the attribution region that the student model focuses on with that of the teacher model. Concurrently, we relax the condition of exact matching in KL divergence and replace it with a more flexible matching criterion, thereby enhancing the model's robustness. Extensive experiments substantiate the effectiveness of our method in improving the robustness of small models, outperforming previous SOTA methods.

Chat is not available.