The Enemy of My Enemy Is My Friend: Exploring Inverse Adversaries for Improving Adversarial Training

Junhao Dong · Seyed-Mohsen Moosavi-Dezfooli · Jianhuang Lai · Xiaohua Xie

West Building Exhibit Halls ABC 390
[ Abstract ]
Thu 22 Jun 4:30 p.m. PDT — 6 p.m. PDT


Although current deep learning techniques have yielded superior performance on various computer vision tasks, yet they are still vulnerable to adversarial examples. Adversarial training and its variants have been shown to be the most effective approaches to defend against adversarial examples. A particular class of these methods regularize the difference between output probabilities for an adversarial and its corresponding natural example. However, it may have a negative impact if a natural example is misclassified. To circumvent this issue, we propose a novel adversarial training scheme that encourages the model to produce similar output probabilities for an adversarial example and its “inverse adversarial” counterpart. Particularly, the counterpart is generated by maximizing the likelihood in the neighborhood of the natural example. Extensive experiments on various vision datasets and architectures demonstrate that our training method achieves state-of-the-art robustness as well as natural accuracy among robust models. Furthermore, using a universal version of inverse adversarial examples, we improve the performance of single-step adversarial training techniques at a low computational cost.

Chat is not available.