150 likes | 195 Views
Dive into PageRank concepts, matrices, iterative computation, Real Web problems, and solutions, illustrated with examples and algorithms by Junghoo "John" Cho from UCLA Computer Science.
E N D
Inverted Index Allows quick lookup of document ids with a particular word Posting list lexicon/dictionary DIC PL(Stanford) Stanford PL(UCLA) UCLA MIT PL(MIT) …
PageRank A page is important if it is pointed by many important pages PR(p) = PR(p1)/c1 + … + PR(pk)/ckpi : page pointing to p, ci : number of links in pi PageRank of p is the sum of PageRanks of its parents One equation for every page N equations, N unknown variables Junghoo "John" Cho (UCLA Computer Science) 2
Example: Web of 1842 Ne MS Am • Netscape, Microsoft and Amazon PR(n) = PR(n)/2 + PR(a)/2 PR(m) = PR(a)/2 PR(a) = PR(n)/2+PR(m) Junghoo "John" Cho (UCLA Computer Science) 3
PageRank: Matrix Notation Web graph matrix M = { mij } Each page i corresponds to row i and column i of the matrix M mij = 1/c if page i is one of the c children of page jmij = 0 otherwise PageRank vector PageRank equation Junghoo "John" Cho (UCLA Computer Science) 4
PageRank: Iterative Computation Initially every page has a unit of importance At each round, each page shares its importance among its children and receives new importance from its parents Eventually the importance of each page reaches a limit Stochastic matrix Junghoo "John" Cho (UCLA Computer Science) 5
Example: Web of 1842 Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 6
PageRank: Random Surfer Model The probability of a Web surfer to reach a page after many clicks, following random links Random Click Junghoo "John" Cho (UCLA Computer Science) 7
Problems on the Real Web Dead end A page with no links to send importance All importance “leak out of” the Web Crawler trap A group of one or more pages that have no links out of the group Accumulate all the importance of the Web Junghoo "John" Cho (UCLA Computer Science) 8
Example: Dead End No link from Microsoft Dead end Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 9
Example: Dead End Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 10
Solution to Dead End Assume a surfer to jumps to a random page at a dead end Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 11
Example: Crawler Trap Only self-link at Microsoft Crawler trap Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 12
Example: Crawler Trap Ne MS Am Junghoo "John" Cho (UCLA Computer Science) 13
Crawler Trap: Damping Factor “Tax” each page some fraction of its importance and distribute it equally Probability to jump to a random page Assuming 20% tax Junghoo "John" Cho (UCLA Computer Science) 14
Algorithm KMP while (m + i) < |D| do: if W[i] = D[m + i], let i = i + 1 if i = |W|, return m otherwise, let m = m + i - T[i], if i > 0, let i = T[i] return no-match