240 likes | 263 Views
This paper introduces an attack-agnostic defense method using convolutional sparse coding to enhance robustness against adversarial examples. It addresses the growing security concerns in deep learning due to adversarial attacks, offering a universal defense strategy. The method leverages dictionary learning and convolutional filters to create a shared projection space for clean and adversarial images. Experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of the proposed defense approach in preserving image quality while enhancing robustness.
E N D
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Background • Deeplearninghasmadegroundbreakingachievementson… • However,securityproblemsbecomemoreandmoreserious.
Background • Adversarialexamplesare inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake. I. J. Goodfellow, J. Shlens, and C. Szegedy. Explain- ing and harnessing adversarial examples, 2014. CoRR, abs/1412.6572.
Background TwoSources: (a).Abnormal:Outsidethenormaldatamanifold. (our focus) (b).Ambiguous:Obfuscatedlabels. bird -> people dog -> fish ship -> bird 0 or 6? bird or bicycle? 4 or 6? unobvious
Background Common AdversarialAttacks. Clean • FGSM:FastGradientSignMethod • BIM:BasicIterativeMethod • DeepFool • C&W FGSM BIM DeepFool C&W
Background DefenseStrategies • ModifyNetwork: Adversarialtraining:addadversarialexamplestotrainingset. • Modifylearningstrategy: Labelsmoothing:softlabel Networkdistillation • Modifyinputimage: Inputtransformation:cropandrescale,denoising,compression Projection:generativemodels
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Motivation • Adversarialimagesdeviatefromnaturalmanifold. • Humancaneasilyfilterperturbation. • Theremustbesomesharedprojectionspaceofcleanandadversarial. • Wecannotenumerateallattacksandnetworks. • Universaldefenseisnecessary.
Goal Universallydefendadversarialexamplesinquasi-naturalspace.
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Background s.t. DictionaryLearning/SparseCoding: J. Mairal, F. Bach, and J. Ponce. Sparse modeling for image and vision processing. Foundations and Trends in Computer Graphics and Vision, 8(2-3):85–283, 2014.
Background ConvolutionalDictionaryLearning(CDL): (pre-learned)dictionaryfilters:featuremap:
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Experiments CIFAR-10(Resolution:32*32): D. Meng and H. Chen. Magnet: A two-pronged defenseagainst adversarial examples, 2017. In CCS. Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples, 2018. In ICLR.
Experiments ImageNet(Resolution:224*224):
Experiments ImageNet(Resolution:224*224): C. Guo, M. Rana, M. Cisse, and L. van der Maaten. Countering adversarial images using input transformations, 2018. In ICLR. S.-M. M.-Dezfooli, A. Shrivastava, and O. Tuzel. Divide, denoise, and defend against adversarial attacks, 2018. arXiv preprint, arXiv:1802.06806v1. A. Prakash, N. Moran, S. Garber, A. DiLillo, and J. Storer. Detecting adversarial attacks with pixel deflection, 2018. In CVPR 2018.
Experiments Intrinsic tradeoff between image reconstruction quality and defensive robustness.
Outline • Background • Motivation and Goal • Method • Experiments • Conclusion
Conclusion • We have proposed a novel state-of-the-art attack-agnostic adversarial defense method with additional increased robustness. • We design a novel sparse transformation layer (STL) to project the inputs to a low-dimensional quasi-natural space. • We evaluate the proposed method on CIFAR-10 and ImageNet and show that our defense mechanism provide state-of-the-art results. • We have also provided an analysis of the trade-off between the projection image quality and defense robustness.