550 likes | 563 Views
Explore the mathematical basis behind search engines, including Markov Chains, Perron-Frobenius Theorem, and the Power Method used in PageRank. Learn about ranking sites and the ergodic theory in this comprehensive guide.
E N D
Lecture 1 & 2 • Introduction • Ergodic Theorem • Perron-Frobenius Theorem • Power Method • Foundations of PageRank
Search Engines Search engine: deterministic Ranking of sites Mathematical foundations: Markov and Perron Frobenius Surf Engines Non deterministic: window shopping Ranking of links Mathematical foundations: Singular Value Decomposition
Initial approach to ranking sites:the in-degree heuristic 2 3 4 1 5 8 6 7
Google’s PageRank approach Problem: rank the sites in order of most visited
Another Approach: Hubs and Authorities Authorities: sites that contain the most important information Hubs: sites that provide directions to the authorities H A
Ranking Hyperlinks • Local • Global • Update
Local ….. ….. ? ? ? ….. …..
Global ….. ….. ? ? ….. …..
2005 2006 Updated E G A B D F C H I J K L
Part I: Ranking Sites • The Web as a Markov Chain • The Ergodic Theorem • Perron-Frobenius Theorem • Algorithms: PageRank, HITS, SALSA
Markov Chains • Text Analysis • Speech Recognition • Statistical Mechanics • Decision Science: Medicine, Commerce ….more recently…..
Bioinformatics d1 d2 d1 I1 I2 I1 M1 M2 M3
Markov’s Ergodic Theorem (1906) Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.
Probabilities after 15 steps X1 = .33 X2 = .26 X3 = .26 X4 = .13 30 steps X1 = .33 X2 = .26 X3 = .23 X4 = .16 100 steps X1 = .37 X2 = .29 X3 = .21 X4 = .13 500 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 1000 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 .45 s1 s2 1 .4 .55 .75 .25 .6 s3 s4
Probabilities after 15 steps X1 = .46 X2 = .20 X3 = .26 X4 = .06 30 steps X1 = .36 X2 = .26 X3 = .23 X4 = .13 100 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 500 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 1000 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 .45 s1 s2 1 .4 .55 .75 .25 .6 s3 s4
Steady State Constraints p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0 X1, X2, X3 are probabilities p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0 Probability of going to S1 from S2 p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0
Solved by Power Iteration Method Based on Perron-Frobenius Theorem Where e is an initial vector and M is the stochastic matrix associated to the system.
Power Method Example 1 2 M = 0 1 0 0 1/2 0 1/2 0 1/2 0 0 1/2 1 0 0 0 4 3 e = 1 1 1 1 0.3636 0.3636 0.1818 0.0909
Power Method Example 1 2 M = 0 1 0 0 1/2 0 1/2 0 1/2 0 0 1/2 1 0 0 0 4 3 e = 2 100 7 3 0.3636 0.3636 0.1818 0.0909
Difficulties • Proof, Design and Implementation • Notion of convergence • Deal with complex numbers • Unique solution ……… Furthermore, it does not always work
The process does not converge, even though the solution is obvious. 3 4 2 5 1 6
Secret Weapon: • Fourier • Gauss • Tarski • Robinson The Power of Elimination
The elimination of the proof is an ideal seldom reached Elimination gives us the heart of the proof
Symbolic Gaussian Elimination Three variables: p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 with Maple we get: x1 = (p31p21 + p31p23 + p32p21) / Σ x2 = (p13p32 + p12p31 + p12p32) / Σ x3 = (p13p21 + p12p23 + p13p23) / Σ Σ = (p31p21 + p31p23 + p32p21 + p13p32 + p12p31 + p12p32 + p13p21 + p12p23 + p13p23)
x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
x1 = (p31p21+ p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
x1 = (p31p21 + p23p31+ p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32+ p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23+ p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33
Symbolic Gaussian Elimination System with four variables: p21x2 + p31x3 + p41x4 = x1 p12x1 + p42x4 = x2 p13x1 = x3 p34x3= x4 ∑xi = 1
with Maple we get: x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
x1 = p21p34p41 + p34p42p21 + p21p31p41+ p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12+ p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12+ p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13+ p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
Do you see the LIGHT? James Brown (The Blues Brothers)
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34/ Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
Ergodic Theorem Revisited If there exists a reverse spanning tree in a graph of the Markov chain associated to a stochastic system, then: • the stochastic system admits the following probability vector as a solution: (b) the solution is unique. (c) the conditions {xi ≥ 0}i=1,n are redundant and the solution can be computed by Gaussian elimination.
Cycle This system is now covered by the proof 2 1 3 6 4 5
Markov Chain as a Conservation System p11x1 + p21x2 + p31x3 = x1(p11 + p12 + p13) p12x1 + p22x2 + p32x3 = x2(p21 + p22 + p23) p13x1 + p23x2 + p33x3 = x3(p31 + p32 + p33) p12 s1 s2 p22 p11 p21 p23 p13 p32 p31 s3 p33
Kirchoff’s Current Law (1847) i12 • The sum of currents flowing towards a node is equal to the sum of currents flowing away from the node. 1 2 i21 i31 i32 i13 3 i31 + i21 = i12 + i13 i32 + i12 = i21 i13 = i31 + i32 i3 + i2 = i1 + i4
Kirchhoff’s Matrix Tree Theorem (1847) The theorem allows us to calculate the number of spanning trees of a connected graph
Two theorems for the price of one!! Differences Ergodic Theorem: The symbolic proof is simpler and more appropriate than using Perron-Frobenius Kirchhoff Theorem: Our version calculates the spanning trees and not their total sum.
Internet Sites • Kleinberg • Google • SALSA • Indegree Heuristic