160 likes | 173 Views
CS522: Algorithmic and Economic Aspects of the Internet. Instructors: Nicole Immorlica (nickle@microsoft.com) Mohammad Mahdian (mahdian@microsoft.com). Previously in this class. Ranking using the hyperlink structure: HITS PageRank. Today. Dealing with web spam
E N D
CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica (nickle@microsoft.com) Mohammad Mahdian (mahdian@microsoft.com)
Previously in this class • Ranking using the hyperlink structure: • HITS • PageRank
Today Dealing with web spam An axiomatic approach to PageRank Next Lecture: Kamal Jain
Recap • The PageRank of a page p is the probability of p in the stationary distribution of a random walk that in each stage with probability 1 – εfollows a random link from the current page, and with probability ε, starts from a random page. • Typically, ε = 0.15.
The collusion problem • What if a group of nodes “collude” to increase the PageRank of one or more in the group? • Zhang, Goel, Govindan, Mason, and Van Roy, WAW 2004. • Define “amplification” of a group of nodes, and prove that it is always at most O(1/ ε).
The collusion problem • Question: Is collusion really a problem? • Experiment (on a web subgraph, and blogstreet): • Take, say, the 1000th and the 1001th nodes in the PageRank order. • Each of these nodes removes all links to other pages, and adds a link to the other. • Compute PageRanks in the new graph. • Results: Ranks of the colluding nodes increase significantly. • Exercise: Go to eBay and search for PageRank.
Finding colluding groups • Approach 1: Find a set S with the largest amplification. • However, it can be shown that this problem is NP-hard.
Finding colluding nodes • Approach 2: Identify colluding individuals • Observation: If we increase ε, the PageRank of a colluding individual decreases (often proportional to 1/ ε). • Heuristic: Compute PageRanks for multiple values of ε, and compute the correlation of the PageRank of each node with 1/ ε. Nodes with high correlation are probably colluding.
Dealing with collusion • We can “punish” colluding individuals by increasing their ε, so that they cannot pass their reputation on to others. • Experimental results
Explaining PageRank • Axiomatic approach • Define a set of “natural” axioms • Prove that PageRank satisfies these axioms • Prove that any page ranking algorithm satisfying these axioms outputs the same ranking as PageRank
Axiomatic Approaches: Voting • Consider a democracy where people submit preference lists over candidates. • A voting rule (or social welfare function) outputs a global ordering of candidates for every set of preference lists.
Voting Axioms • Unanimity: If everyone prefers the candidate x to y, then the global ordering also ranks x above y. • Independence of irrelevant alternatives (IIA): For any two candidates x and y, changes in people’s rankings of candidates other than x and y should not affect the relative position of x and y in the global ordering.
Arrow’s (Im)possibility Theorem • Theorem [Arrow, 1951]: The only function satisfying unanimity and IIA is dictatorship. • Extensions • Similar results hold for social choice functions where a single candidate (winner) must be chosen [Muller-Satterthwaite, 1977] • Majority rule arises naturally when we relax IIA or restrict the preference domain of people (i.e., impose rules on how they can rank candidates).
Axiomatic Approach: PageRank • Agents are nodes of graph. Agents output a “vote” over other agents as represented by a directed graph G. • A ranking algorithm is a function mapping every directed graph to an ordering of its nodes.
PageRank Axioms • Isomorphism: The ranking procedure should be independent of the names of the nodes. • Self edge: Adding self loops should not harm a node and should not affect other nodes. • Vote by committee: Importance a gives to b and c by voting shouldn’t change if a votes via committee. • Collapsing: If two nodes vote similarly, and are linked to by disjoint sets of nodes, the ranking does not change when they are collapsed to one node. • Proxy: There is an equal distribution of importance.
PageRank: Altman and Tennenholtz • Theorem: PageRank satisfies axioms. • Theorem: PageRank is only ranking algorithm which satisfies axioms (i.e., every other ranking algorithm which satisfies axioms outputs same ranking as PageRank).