530 likes | 726 Views
Ranking Systems: Manipulability and Efficiency. Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6). *Work supported by NSF. ITR-0325453 . Ranking and Reputations. Reputations are important Webpage ranking: links are “recommendations”
E N D
Ranking Systems: Manipulability and Efficiency Eric Friedman*, ORIE Cornell University (Currently visiting: Dept of CS, U.C. Berkeley, 2005-6) *Work supported by NSF. ITR-0325453
Ranking and Reputations • Reputations are important • Webpage ranking: links are “recommendations” • High ranks lead to more “clicks” • P2P: choosing partners • Ebay: reputations are crucial (and quite valuable). • Higher reputations lead to higher prices • PGP: web of trust. • Spam and DDoS protections
Problems with Reputation Systems • Gaming reputation systems is becoming a serious problem. • P2P: seti@home, Kazaa-lite • Webpage ranking: link spamming • Note: most (all?) current reputation systems are ad-hoc • No formal requirements etc.
A research agenda:Understanding the tradeoffs between manipulability and efficiency • Quantify the manipulability of ranking systems. • Quantify the efficiency of ranking systems. • Find the ranking systems that are on the efficient frontier and maximize various objectives.
Today’s talk (some first steps) • A framework for manipulability (w/Alice Cheng) • Characterization of manipulability of ranking systems. • Empirical analysis of PageRank on the WWW (w/Alice Cheng) • Evaluating the Efficiency of ranking mechanisms (work in progress)
Part I: Goals and Approach • Our goal: create a formalism for analyzing and designing reputation systems that are robust to attacks. • Here we focus on sybils, but although this is important in itself, our goals are much broader. • Note: the definitions were harder than the proofs. • Approach: Game theory, mechanism design (i.e., Arrows Theorem)
Trust Graphs 1 3 2 • Most reputation systems use trust graphs: • G=(V,E) • e=(i,j) then T(e) = i’s (direct) trust of j. • higher T(e) is better • Reputation function: f(G)i = reputation of i. • Rank: i outranks j if f(G)i >f(G)j • Note: we focus on rank • Why use a trust graph? • Many (most?) interactions are 1st time interactions • (i,j)E 1 1 2 3
Some Representative Reputation Systems • Pagerank and related systems (Brin and Page 98, Kleinberg 98, Guha et. al. 04) • Start at an arbitrary node and then take a random walk on the graph. • Flow methods (e.g., Flake et. al. 02, Chuang and Stoica 02) • Compute the max flow from i to j. • Shortest path method. • Let c(e)=1/T(e) then find the shortest path from i to j in terms of c’s.
Shortest Path t s
Sybils • A single “agent” can replicate itself under a variety of pseudonyms.
Sybil Attacks • Sybils are essentially unavoidable (Douceur 02) • Sybil clouds can forge trust among each other. • Using strong cryptography to prevent them is expensive and awkward.
Sybils in Practice • Web ranking: Create a large number of dummy websites and then all link to each other. • P2P: create a large number of peers and then give each other high ratings • Ebay: fake transactions with yourself. • Amazon shopping: post high evaluations of your own products.
Robustness Against Sybils • Pagerank: not robust. • Empirically, can increase pageranks dramatically with a few sybils. (more later) • Max-flow: value robust but not rank robust. • Shortest path: robust.
Robustness: Pagerank • Pagerank: not robust.
Robustness: Pagerank • Pagerank: not robust. • Create a “flower”
Robustness: Maxflow • Max-flow: Designed for value robustness • Flow into and out of sybil cloud cannot be changed! Min cut s Sybil Cloud
Robustness: Maxflow • Max-flow: not rank robust • b is higher ranked than a [1] Min cut a 1 0.7 b 0.5 [1.2]
Robustness: Maxflow • Max-flow: not rank robust • a is higher ranked than b [1] a 1 0 b 0.5 [0.5]
Robustness: Shortest Path • Shortest path: robust • a is higher ranked than b [1] a c=1 c=1 b c=3 [2]
Robustness: Shortest Path • Shortest path: robust • a is higher ranked than b • a can harm b, but a is already higher ranked than b • b cannot hurt a, since it is not on the shortest path to a [1] a c=1 c=3 b c=3 [3]
Sybilproofness • Def: A sybil strategy for node i in G=(V,E) is G’=(V’,E’) and U’V’, such that by collapsing U’, G is obtained. (T’s are added together) • Def: f is k-sybilproof if there does not exist any pair of nodes i,j and a sybil strategy for i such that f(G)i< f(G)j and f(G’)r> f(G)j for rU and |U’|k+1. • Def: f is sybilproof if it is k-sybilproof for all k>0. • Key: sybils can only forge recommendations among each other.
Results: Symmetric Reputations • Def: A reputation function is symmetric if it is covariant under graph isomorphism. • Theorem: There is no nontrivial symmetric sybilproof mechanism. • In fact, for any G, any node (except the top one) can improve their ranking via sybils • Theorem: There is no nontrivial symmetric k-sybilproof mechanism, for any k1. • (How often this occurs for small k is open.)
Proof (via the butterfly) j s i G U’ • Sybilproofness: by symmetry, f(G’)j=f(G’)s • K-sybilproofness: build G’ one sybil at a time
Results: Non-Symmetric • Theorem: There exist sybilproof reputation functions. (e.g., shortest path) • Def: Given a root node sV, let P be the set of all collections of edge disjoint paths* from s to i. Let g be a function from paths to reals and be an (addition-like) operator on the reals.
Results: Non-Symmetric • Let f(G)i=max{P P}{pP} g(p) • Max flow: g(p)=min{T(e)|ep}, =+ • Shortest path:g(p)=min{T(e)|ep}, =min • Other generalizations • Leaky pipes etc.
Results: Non-Symmetric • Theorem: f as defined above is value sybilproof assuming • If p’ is an extension of p, then g(p’)<g(p). • is nondecreasing and g is nondecreasing with respect to T. • If p=p’+p’’ then g(p)=g(p’) g(p’’)
Results: Non-Symmetric • Theorem: f as defined above is ranksybilproof iff =max, assuming: • For any p there exist an extension p’ such that g(p)=g(p’). • I.e., f depends on the maximal path.
Summary (Part I) • A framework for the analysis of the manipulability of ranking systems. • Key distinction: rank vs. value • Result 1: all symmetric ranking systems are manipulable. • Result 2: “flow based” ranking systems are not value manipulable but are rank manipulable. • Result 3: “path based” ranking systems are not manipulable.
Part II: Empirical Analysis of PageRank • (Joint with Alice Cheng) • (Inspired by Zhang et. al. on collusion) • Stanford web matrix -- ~280k pages. • Question:How often are a small number of sybils helpful? • Answer: Surprisingly often!
Summary of Empirical • Analytic approximations for these. • PageRank is quite manipulable • Especially for low ranked pages • (but that’s where automated methods are supposed to work!)
Part III: Quantifying the Efficiency of Ranking Mechanisms • Work in progress – some preliminary results. • Is FlowRank or PageRank better than PathRank?
Model • Random graph model (descriptive, not constructive) • Follow the intuition behind pagerank • Pages link more to “better pages” • Better pages are more selective. • Pr(link)=f(qi,qj) • Increasing in qj • FOSD in qi • Average outdegree = k, (n∞) • (many results have k∞, and miss important aspects of ranking.)
Finding “Baddies” • 2 layer example: • ½ nodes are H and ½ L • L’s link uniformly at random • H’s link to H with (relative) probability (1+a) and to L’s with (1-a). • a=0, random graph • a=1, two tiered graph
Statistical Inference • Now, ranking is a problem of statistical inference • G is a random variable • r is a statistical estimate of true qualities • Note: unlike most inference problems we only have a single sample
3 methods • PageRank • InRank: rank by indegree • MLRank: compute a maximum likelihood estimate.
Results • Pr(error)=Pr(ri>rj|qi<qj) • InRank: difference of Poissons • PageRank: two stage calculation • First by quality then statistical manipulations of PageRank equations. • MLRank: find a subgraph with the maximal number of edges. • NP complete • Implemented a greedy algorithm
Results PageRank PageRank InRank Pr(error) InRank MLRank MLRank a
Results • InRank better than PageRank when graph is close to random and vice versa. (General Theorem) • Differences can be significant! • MLRank is significantly better.
Some Intuition • Case a=0 (Sketch -- ignoring special cases) • PageRank • rj’s are iid (in limit) • InRank • Theorem: PageRank is more random. • (But, also need to consider expected values)