80 likes | 124 Views
Learn how Google's PageRank™ algorithm, developed by Larry Page and Sergey Brin, determines the importance of webpages based on the number of links. Understand the computational complexity and the mathematical concept behind it. Discover the significance of eigenvectors in relation to PageRank.
E N D
PageRank 2 1 • Google : its search listings always seemed deliver the “good stuff” up front. • Part of the magic behind it is its PageRank Algorithm • PageRank™ algorithm, developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford University.
PageRank (Basic Idea) Importance Score: rating for webpage importance Importance score Suppose the web of interest contains n pages Each page indexed by an integer k Example: 3 The importance score of page k on the web 1 Indicate that page m is more important than page j 2 4
PageRank (Basic Idea) Simple approach: Number of links 3 1 This approach ignore : a link from an important page 2 4 Do they have same importance. Page 1 has link from page 3 (Page 3 has the maximum score)
PageRank (Basic Idea) Let’s compute the score of page j as the sum of the scores of all pages linking to page j. 3 1 Just as in election: Do they have same importance. Page 1 has link from page 3 (Page 3 has the maximum score) 2 4 • a link to page k becomes a vote for page k’s importance • we don’t want a single indivisiual to gain influence merely by casting multiple votes
PageRank (Basic Idea) Just as in election: • a link to page k becomes a vote for page k’s importance • we don’t want a single indivisiual to gain influence merely by casting multiple votes 3 1 2 4
PageRank (Basic Idea) Problem: In linear algebra language: Find an eigenvector for a matrix A associated with eigenvalue 3 1 Note that Page 3 is not the most important page 2 4
PageRank (Basic Idea) In Numerical linear algebra language: Find an efficient computational algorithm to compute eigenvectors Difficulties & Features Google's PageRank is an eigenvector of a matrix of order 2.7 billion (May 2002) 3 1 (a google blog post claimed in 2008) It is recomputed about once a month and does not involve any of the actual content of Web pages or of any individual query. 2 4 There are more than one linearly independent eigenvectors
PageRank (Basic Idea) In Numerical linear algebra language: Find an efficient computational algorithm to compute eigenvectors Difficulties & Features The matrix A is sparse (tons of zeros) The matrix A. Its elements are all strictly between zero and one and its column sums are all equal to one. (Markov chain) 3 1 one way to compute the eigenvector x would be to start with a good approximate solution, such as the PageRanks from the previous month, and simply repeat the assignment statement (In Numerical: Continuation Method) 2 4