1 / 21

On the Hardness of Evading Combinations of Linear Classifiers

On the Hardness of Evading Combinations of Linear Classifiers. Daniel Lowd University of Oregon Joint work with David Stevens. Machine learning is used more and more in adversarial domains…. Intrusion detection Malware detection Phishing detection Detecting malicious advertisements

jules
Download Presentation

On the Hardness of Evading Combinations of Linear Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Hardness of EvadingCombinations of Linear Classifiers Daniel Lowd University of Oregon Joint work with David Stevens

  2. Machine learning is used more and more in adversarial domains… • Intrusion detection • Malware detection • Phishing detection • Detecting malicious advertisements • Detecting fake reviews • Credit card fraud detection • Online auction fraud detection • Email spam filtering • Blog spam filtering • OSN spam filtering • Political censorship • …and more every year!

  3. Evasion Attack • System designers deploy a classifier. • An attacker learns about the model through interaction (and possibly other information sources). • An attacker uses this knowledge to evade detection by changing its behavior as little as possible. Example: A spammer sends test emails to learn how to modify a spam so that it gets past a spam filter. Question: How easily can the attacker learn enough to mount an effective attack?

  4. X2 X1 Adversarial Classifier Reverse Engineering (ACRE) [Lowd&Meek,’05] Task:Find the negative instance “closest” to xa(We will also refer to this distance as a “cost” to be minimized.) Problem: the adversary doesn’t know the classifier! - xa +

  5. ? ? ? ? X2 - ? + ? ? ? X1 Adversarial Classifier Reverse Engineering (ACRE) [Lowd&Meek,’05] Task: Find the negative instance “closest” to xa Given: xa Within a factor of k • One positive and one negative instance, x+ and x • A polynomial number of membership queries

  6. X2 xa X1 Example: Linear Classifiers • With continuous features and L1 distance, find optimal point by doing line search in each dimension: • However, with binary features, we can’t do line searches. * -- Somewhat more efficient methods exist for the continuous case. [Nelson&al.,2012].

  7. x- xa c(x) wi wj wk wl wm c(x) y x- xa wi wm wj wk wl c(x) y’ xa wi wj wk wl wp Attacking Linear Classifiers withBoolean features Can efficiently find an evasion with at most twice the optimal cost, assuming unit cost for each “change”. [Lowd&Meek’05] METHOD: Iteratively reduce cost in two ways: • Remove any unnecessary change: O(n) • Replace any two changes with one: O(n3) Also known: Any convex-inducing classifier with continuous features is ACRE-learnable.[Nelson&al.,2012]

  8. What about non-linear classifiers? This work: We consider when the positiveor negativeclass is an intersection of half-spaces, or polytope, representable by combinations of linear classifiers: Positive class is conjunctionoflinear classifiers. Example: One classifier to identifyeach legitimate user. Positive class is disjunctionoflinear classifiers.Example: One classifier foreachtype of attack. We show that the attack problem is hard in general, but easy when the half-spaces are defined over disjoint features.

  9. Hardness Results • With continuous features and L1 costs, near-optimal evasion of a polytope requires polynomially many queries. [Nelson et al., 2012] • With discrete features, we show that exponentially many queries are required in the worst case. • Proofs work for any fixed approximation ratio k. Key Idea:Construct a set of component classifiers so there is no clear path from “distant” to “close” negative instances.

  10. Hardness of Evading Disjunctions (Instance is negative only if all component classifiers mark it as negative.) n/2k classifiers • Two ways to evade: • Include all light-green features (cost: n/2+1) • Include all dark-green features (cost: n/2k) • Challenge: • If you don’t guess all dark-green features, some classifier remains positive. • If you include extra red features, all classifiers become positive. • Guessing low-cost instance requires exponentially many queries! n/2+1

  11. Hardness of Evading Conjunctions (Instance is negative only if any component classifier marks it as negative.) c2 c1 • To evade c2: Include > ½ the light-green features(cost: n/4+1) • To evade c1: Include all dark-green features (cost: n/4k), or all light-green features (cost: n/2), or a combo. • Two cases: • When > ½ the light-green features are included, c2 is negative so dark-greens have no effect on the class label. • When < ½ the light-green features are included, we need > ½ the dark-green features to evade c1. • Adversary must guess n/8k features! n/2 n/4k

  12. Restriction: Disjoint Features • In practice, classifiers do not always represent the worst case. • In some applications, each classifier in the set works on a different set of features: • Image or fingerprint biometrics classifiers • Separate image spam and HTML spam classifiers • This simple restriction makes attacks easy!

  13. Evading Disjoint Disjunctions (Instance is negative only if all component classifiers mark it as negative.) Theorem: Linear attack from [Lowd&Meek,2005] is at most twice optimal on disjoint disjunctions. Proof Sketch: When features are disjoint, the optimal evasion is to evade each component classifier optimally. When the algorithm terminates, there is no way to reduce the cost with individual or pairs of changes, so each separate evasion is at most twice optimal. x- xa Example: c1(x) wi wj wk wl x- xa c2(x) wm wn wo

  14. Evading Disjoint Conjunctions (Instance is negative if any component classifier marks it as negative.) Theorem: By repeating linear attack with different constraints, we can efficiently find an attack that is at most twice optimal. Proof Sketch: • Each component classifier has some optimal evasion. • The optimal overall attack is the cheapest of these attacks. • Running the linear attack once finds a good evasion against some classifier. • Since it’s an evasion, one classifier must be negative. • All feature changes for other classifiers can be removed. • Since no individual or pair of changes reduces the cost, this evasion is at most twice optimal. • By rerunning the linear attack restricted to features we haven’t used before, eventually we will find good evasions against all component classifiers.

  15. Experiments • Data: 2005 TREC spam corpus • Component classifiers: LR (SVM, NB in paper) • Features partitioned into 3 or 5 sets: • Randomly • Spammy / Neutral / Hammy [Jorgensen et al., 2008] • Fixed overall false negative rate to 10%. • We attempted to disguise 100 different spams. To make this more challenging, we first added 100 random “spammy” features to each spam.

  16. Results: Attack Cost

  17. Results: Attack Optimality

  18. Results: Attack Efficiency Number of queries before algorithms terminate: Conjunction: ~1,000,000(Restricted: ~50,000) Disjunction: ~10,000,000(Restricted: ~700,000)

  19. 1 million queries is not very efficient! • The purpose of this experiment is to understand how performance depends on different factors, not the exact number of queries. • In practice, the adversary’s job is much easier: • We added 100 spammy features to make it harder. • Additional background knowledge could make this much easier. • Restricted vocabulary reduces queries 10x with minimal increase in attack cost(90% of the time, still within 2x of optimal) • Attackers don’t need guarantees of optimality.

  20. Results: Attack Efficiency Number of queries before our attack is within twice optimal: Conjunction: ~3,000 / ~100,000 Disjunction: ~10,000 / ~300,000 Attacks are even easier with background knowledge and without 100 spammy words.

  21. Discussion and Conclusion • Evading discrete classifiers is provably harder than evading continuous classifiers. • Linear: k-approximation vs. (1+ε)-approximation • Polytope: Exponential vs. polynomial queries • Interesting sub-classes of discrete non-linear classifiers are still vulnerable. • Disjoint features are a sufficient condition • Open question: What other sub-classes are vulnerable? • Conjunction (convex spam) is theoretically harder but practically easier. • In addition to worst-case bounds, we need realistic simulations that can be applied to specific classifiers.

More Related