1 / 16

CS522: Algorithmic and Economic Aspects of the Internet

CS522: Algorithmic and Economic Aspects of the Internet. Instructors: Nicole Immorlica (nickle@microsoft.com) Mohammad Mahdian (mahdian@microsoft.com). Previously in this class. Ranking using the hyperlink structure: HITS PageRank. Today. Dealing with web spam

hermannm
Download Presentation

CS522: Algorithmic and Economic Aspects of the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS522: Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica (nickle@microsoft.com) Mohammad Mahdian (mahdian@microsoft.com)

  2. Previously in this class • Ranking using the hyperlink structure: • HITS • PageRank

  3. Today Dealing with web spam An axiomatic approach to PageRank Next Lecture: Kamal Jain

  4. Recap • The PageRank of a page p is the probability of p in the stationary distribution of a random walk that in each stage with probability 1 – εfollows a random link from the current page, and with probability ε, starts from a random page. • Typically, ε = 0.15.

  5. The collusion problem • What if a group of nodes “collude” to increase the PageRank of one or more in the group? • Zhang, Goel, Govindan, Mason, and Van Roy, WAW 2004. • Define “amplification” of a group of nodes, and prove that it is always at most O(1/ ε).

  6. The collusion problem • Question: Is collusion really a problem? • Experiment (on a web subgraph, and blogstreet): • Take, say, the 1000th and the 1001th nodes in the PageRank order. • Each of these nodes removes all links to other pages, and adds a link to the other. • Compute PageRanks in the new graph. • Results: Ranks of the colluding nodes increase significantly. • Exercise: Go to eBay and search for PageRank.

  7. Finding colluding groups • Approach 1: Find a set S with the largest amplification. • However, it can be shown that this problem is NP-hard.

  8. Finding colluding nodes • Approach 2: Identify colluding individuals • Observation: If we increase ε, the PageRank of a colluding individual decreases (often proportional to 1/ ε). • Heuristic: Compute PageRanks for multiple values of ε, and compute the correlation of the PageRank of each node with 1/ ε. Nodes with high correlation are probably colluding.

  9. Dealing with collusion • We can “punish” colluding individuals by increasing their ε, so that they cannot pass their reputation on to others. • Experimental results

  10. Explaining PageRank • Axiomatic approach • Define a set of “natural” axioms • Prove that PageRank satisfies these axioms • Prove that any page ranking algorithm satisfying these axioms outputs the same ranking as PageRank

  11. Axiomatic Approaches: Voting • Consider a democracy where people submit preference lists over candidates. • A voting rule (or social welfare function) outputs a global ordering of candidates for every set of preference lists.

  12. Voting Axioms • Unanimity: If everyone prefers the candidate x to y, then the global ordering also ranks x above y. • Independence of irrelevant alternatives (IIA): For any two candidates x and y, changes in people’s rankings of candidates other than x and y should not affect the relative position of x and y in the global ordering.

  13. Arrow’s (Im)possibility Theorem • Theorem [Arrow, 1951]: The only function satisfying unanimity and IIA is dictatorship. • Extensions • Similar results hold for social choice functions where a single candidate (winner) must be chosen [Muller-Satterthwaite, 1977] • Majority rule arises naturally when we relax IIA or restrict the preference domain of people (i.e., impose rules on how they can rank candidates).

  14. Axiomatic Approach: PageRank • Agents are nodes of graph. Agents output a “vote” over other agents as represented by a directed graph G. • A ranking algorithm is a function mapping every directed graph to an ordering of its nodes.

  15. PageRank Axioms • Isomorphism: The ranking procedure should be independent of the names of the nodes. • Self edge: Adding self loops should not harm a node and should not affect other nodes. • Vote by committee: Importance a gives to b and c by voting shouldn’t change if a votes via committee. • Collapsing: If two nodes vote similarly, and are linked to by disjoint sets of nodes, the ranking does not change when they are collapsed to one node. • Proxy: There is an equal distribution of importance.

  16. PageRank: Altman and Tennenholtz • Theorem: PageRank satisfies axioms. • Theorem: PageRank is only ranking algorithm which satisfies axioms (i.e., every other ranking algorithm which satisfies axioms outputs same ranking as PageRank).

More Related