300 likes | 448 Views
DSybil: Optimal Sybil-Resistance for Recommendation Systems. Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael Kaminsky Intel Research Pittsburgh Phillip B. Gibbons Intel Research Pittsburgh Feng Xiao National University of Singapore.
E N D
DSybil: Optimal Sybil-Resistancefor Recommendation Systems Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael KaminskyIntel Research Pittsburgh Phillip B. Gibbons Intel Research Pittsburgh Feng XiaoNational University of Singapore
Attacks on Recommendation Systems Netflix, Amazon, Razor, Digg, YouTube, … Attacker may cast misleading votes To be more effective • Bribe other users • Compromise other users Ultimate form: Sybil attack Haifeng Yu, National University of Singapore
launch sybil attack Sybil Attack honest automated sybil attack for $147 malicious Haifeng Yu, National University of Singapore “Post at random intervals to make it look like real people” “Supports multiple random proxies to make posts look like they came from visitors across the world” “Multithreaded comment blaster with account rotation” …
Background: Defending Against Sybil Attack Sybil defense widely considered challenging: >1000 papers acknowledging sybil attack, most without having a solution • Tie identities to human beings based on credentials (e.g., passport) • Privacy concerns, etc. • Resource challenges • Vulnerable to attacks from botnets • Social-network-based defense • SybilGuard[SIGCOMM’06], SybilLimit[Oakland’08], SybilInfer[NDSS’09], SumUp[NSDI’09] • Better guarantees Haifeng Yu, National University of Singapore
Rec Systems Are More Vulnerable # sybil identities we can tolerate (n identities total) On an avg Digg object, only 1 out of every 500 honest users vote n/500sybil identities are sufficient to out-vote the honest voters Haifeng Yu, National University of Singapore
Social-network-based Defenses Not Sufficiently Strong For Rec Systems Lower bound on all social-network-based approaches • Applicable to SybilGuard, SybilLimit, SybilInfer, SumUp, etc… • Compromising a degree-10 node creates 10 sybil identities • To create n/500 sybil identities: Compromise 1 node out of every 5000 honest nodes is sufficient Haifeng Yu, National University of Singapore
Alternative: Leverage History and Trust Ancient idea: Adjust “trust” to an identity based on its historical behavior Numerous heuristics proposed -- target a few fixed attack strategies • No guarantees beyond the few strategies targeted • Attacker is intelligent and will adapt arms race Haifeng Yu, National University of Singapore
Our Results D : Dimension of the objects (< 10 in Digg) M : Max # of sybil identities voting on each obj DSybil: A novel defense mechanism • Based on feedback and trust • Loss (# of bad recommendations) is provably even under worst-case attack • We prove that DSybil’s loss is optimal Experimental results (from 1-year Digg trace): • High-quality recommendation under potential sybil attack (with optimal strategy) from million-node botnet Haifeng Yu, National University of Singapore
Outline Background and our contribution Trust-based approaches – The obvious, the subtle, and the challenge Main component of DSybil: DSybil’s recommendation algorithm Experimental results Haifeng Yu, National University of Singapore
gain trust for free sybil identity A good object Subtle Aspects of Using Trust my vote: this obj is good! 1. How to identify “correct” but “non-helpful” votes? • Vote for a good object that already has many votes -- this additional vote is “non-helpful” • Sybil identities may gain trust for free • Determine the “contribution amount” by voting order does not work – see paper Haifeng Yu, National University of Singapore
Subtle Aspects of Using Trust 2. How to assign initial trust to new identities? • Positive initial trust for all: Invites whitewashing • “Trial period of 5 votes” not effective • Cast 5 “correct” votes and then cheat 3. How exactly to grow trust? • Multiplicatively? Additively? 4. How exactly to make recommendations? • Pick obj with most votes? Probabilistically? How about negative votes? ….. Haifeng Yu, National University of Singapore
The Central Challenge Numerous design choices -- fundamental tension between • Giving trust to honest identities • Not giving trust to sybil identities casting “correct” votes (who may cause damage later) Impossible to explore all design alternatives • Our approach: Directly design an optimal algorithm • Needs to strike the optimal balance Haifeng Yu, National University of Singapore
DSybil’s Key Insights all log-scale % of users casting x votes # votes cast (on various objs) Key #1: Leverage typical voting behavior of honest users • Heavy-tail distribution • Exist very active users who cast many votes Key #2: If user is already getting “enough help”, then do not give out more trust • Enables us to strike an optimal balance Haifeng Yu, National University of Singapore
System Model and Attack Model Haifeng Yu, National University of Singapore Objects to be recommended are either good or bad (e.g., Digg) DSybil is personalized • Each user may have different subjective opinions • Different users may get different recommendations • From now on, always with respect to a user Alice • Run by either Alice or a central server
System Model and Attack Model 2 good objs 2 bad objs DSybil does not know which are good • Each round has a pool of objects • DSybil recommends one object for Alice to consume • Alice provide feedbacks after consumption • DSybil adjust trust based on feedback • See paper for generalizations… Haifeng Yu, National University of Singapore
System Model and Attack Model 2 good objs 2 bad objs H E G H F Haifeng Yu, National University of Singapore Other identities have cast votes • DSybil only use positive votes • We prove that using negative votes will not help… Each identity cast at most one vote/object At most M (e.g. 1010) sybil identities voting on each object
DSybil Rec Algorithm: Classifying Objects 2 good objs 2 bad objs : 0.2 H : 0.2 E : 0.2 : 0.2 G H : 0.2 F total : 0.4 total : 0.2 total : 0.2 total : 0.2 • Reminder: Trust is always with respect to Alice (how much Alice “trusts” the given identity) • Each identity starts with initial trust 0.2 -- Fix later… • An object is overwhelming if total trust ≥ C • C = 1.0 Haifeng Yu, National University of Singapore
trust to E: 0.2 0.2 trust to F: 0.2 0.2 Rounds without Overwhelming Objects trust to G: 0.2 0.2 2 good objs 2 bad objs : 0.2 H : 0.2 E : 0.2 : 0.2 G H : 0.2 F total : 0.4 total : 0.2 total : 0.2 total : 0.2 2. Adjust trust after feedback: • If obj bad, multiply trust of voters by 0 ≤ < 1 • If obj good, multiply trust of voters by > 1 Recommend obj with largest total trust would result in linear (instead of logarithmic) loss… Additive increase would result in linear (instead of logarithmic) loss… Haifeng Yu, National University of Singapore 1. Recommend: Uniformly random obj
Defining Guides and Dimension X X Y W Z DSybil does not know who are the guides (critical guides) or what the dimension is Dimension = 2; Critical guides = {X, Y} or {X, W} Haifeng Yu, National University of Singapore Guides: Honest users with same/similar “opinion” with Alice • Never/seldom votes for bad objects Dimension: # of guides needed to “cover” large fraction (e.g., 60%) of the good objects -- Called critical guides 19
Key #1: Leverage Small Dimension Haifeng Yu, National University of Singapore Dimension is typically small in practice – results later… Small dimension Will encounter critical guides frequently when picking random objects • Trust to critical guides quickly grow to C • This will result in overwhelming objects… 20
trust to E: 1.0 1.0 trust to H: 0.2 0.2 Rounds with Overwhelming Objects 2 bad objs 1 good obj : 1.0 E : 0.2 : 1.0 G F : 0.2 : 0.2 H H total : 1.2 total : 0.4 total : 1.0 1. Recommend: Arbitrary overwhelming obj • Will confiscate sufficient trust if object is bad… 2. Adjust trust after feedback: • If obj bad, multiply trust of the voters by 0 ≤ < 1 • If obj good, no additional trust given out Haifeng Yu, National University of Singapore
Key #2: Identify Whether Help is Sufficient Consumes good overwhelming object = Alice already has “sufficient help” Thus do not give out additional trust • Prevent sybil identities from getting trust “for free” • May hurt honest identities (But remember this is optimal…) Haifeng Yu, National University of Singapore
Omitted Details No “free” initial trust given out when Alice is getting “sufficient help” Proof for loss even under worst-case attack Optimality Alternative designs/tweaks • Most will break optimality Haifeng Yu, National University of Singapore
Results on Dimension One-year Digg dataset with half-million users • Pessimistically assuming guides are only 2% of the honest users -- see paper for other settings… • To cover 60% of good objs, need only 3 guides Robustness: • Remove the previous 3 guides – 5 guides to cover 60% • Remove top 100 heaviest voters – 5 guides needed to cover 60% • See paper for more… Relates to heavy-tail distribution of votes cast by individual users – see paper • Exist very active users who cast many votes • Similar heavy-tail distribution observed in 4 other datasets Haifeng Yu, National University of Singapore
Results on Loss (Based on Digg Dataset) Attack capacity: Max 10 billion sybil voters on any obj • In Digg, avg # honest voters on each obj is only ~1,000 Fraction of bad recommendations (under worst-case attack): 12% Growing defense: 5% if user has used DSybil for a week before attack starts • If attack starts at random point, applies to 51/52 = 98% users 1-minute computational puzzle per week • 10 billion identities needs a million-node botnet Haifeng Yu, National University of Singapore
Conclusion Defending against sybil attacks is challenging • It is even harder in the context of rec systems DSybil: Provable and optimal loss • Almost no previous approaches provide provable guarantees against worst-case attack DSybil key insights: • Leverage small dimension of the voting pattern • Carefully identify when help is already “sufficient” Haifeng Yu, National University of Singapore
Which object to pick? Haifeng Yu, National University of Singapore
Central Question Answered by This Work Can trust sufficiently diminish the influence of sybil identities in recommendation systems? Aim for provable guarantees under all attack strategies (including worst-case attack from intelligent attacker) Short answer: YES! Haifeng Yu, National University of Singapore
Our Results DSybil: A novel defense mechanism • Growing defense: If the user has used DSybil for some time before the attack starts, loss will be even smaller • Experimental results (from one-year trace of Digg): High-quality recommendation even under potential sybil attack (with optimal strategy) from a million-node botnet Haifeng Yu, National University of Singapore