1 / 55

Mathematical Foundations for Search and Surf Engines

Explore the mathematical basis behind search engines, including Markov Chains, Perron-Frobenius Theorem, and the Power Method used in PageRank. Learn about ranking sites and the ergodic theory in this comprehensive guide.

harveyb
Download Presentation

Mathematical Foundations for Search and Surf Engines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mathematical Foundations for Search and Surf Engines

  2. Jean-Louis Lassez

  3. Ryan Rossi

  4. Lecture 1 & 2 • Introduction • Ergodic Theorem • Perron-Frobenius Theorem • Power Method • Foundations of PageRank

  5. Search Engines Search engine: deterministic Ranking of sites Mathematical foundations: Markov and Perron Frobenius Surf Engines Non deterministic: window shopping Ranking of links Mathematical foundations: Singular Value Decomposition

  6. Initial approach to ranking sites:the in-degree heuristic 2 3 4 1 5 8 6 7

  7. Google’s PageRank approach Problem: rank the sites in order of most visited

  8. Another Approach: Hubs and Authorities Authorities: sites that contain the most important information Hubs: sites that provide directions to the authorities H A

  9. Ranking Hyperlinks • Local • Global • Update

  10. Local ….. ….. ? ? ? ….. …..

  11. Global ….. ….. ? ? ….. …..

  12. 2005 2006 Updated E G A B D F C H I J K L

  13. Part I: Ranking Sites • The Web as a Markov Chain • The Ergodic Theorem • Perron-Frobenius Theorem • Algorithms: PageRank, HITS, SALSA

  14. Markov Chains • Text Analysis • Speech Recognition • Statistical Mechanics • Decision Science: Medicine, Commerce ….more recently…..

  15. Bioinformatics d1 d2 d1 I1 I2 I1 M1 M2 M3

  16. Internet

  17. Markov’s Ergodic Theorem (1906) Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.

  18. Probabilities after 15 steps X1 = .33 X2 = .26 X3 = .26 X4 = .13 30 steps X1 = .33 X2 = .26 X3 = .23 X4 = .16 100 steps X1 = .37 X2 = .29 X3 = .21 X4 = .13 500 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 1000 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 .45 s1 s2 1 .4 .55 .75 .25 .6 s3 s4

  19. Probabilities after 15 steps X1 = .46 X2 = .20 X3 = .26 X4 = .06 30 steps X1 = .36 X2 = .26 X3 = .23 X4 = .13 100 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 500 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 1000 steps X1 = .38 X2 = .28 X3 = .21 X4 = .13 .45 s1 s2 1 .4 .55 .75 .25 .6 s3 s4

  20. Steady State Constraints p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0 X1, X2, X3 are probabilities p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0 Probability of going to S1 from S2 p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 xi ≥ 0

  21. Solved by Power Iteration Method Based on Perron-Frobenius Theorem Where e is an initial vector and M is the stochastic matrix associated to the system.

  22. Power Method Example 1 2 M = 0 1 0 0 1/2 0 1/2 0 1/2 0 0 1/2 1 0 0 0 4 3 e = 1 1 1 1 0.3636 0.3636 0.1818 0.0909

  23. Power Method Example 1 2 M = 0 1 0 0 1/2 0 1/2 0 1/2 0 0 1/2 1 0 0 0 4 3 e = 2 100 7 3 0.3636 0.3636 0.1818 0.0909

  24. Difficulties • Proof, Design and Implementation • Notion of convergence • Deal with complex numbers • Unique solution ……… Furthermore, it does not always work

  25. The process does not converge, even though the solution is obvious. 3 4 2 5 1 6

  26. Perron-Frobenius Theorem: Is it really necessary?

  27. Secret Weapon: • Fourier • Gauss • Tarski • Robinson The Power of Elimination

  28. The elimination of the proof is an ideal seldom reached Elimination gives us the heart of the proof

  29. Symbolic Gaussian Elimination Three variables: p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2 p13x1 + p23x2 + p33x3 = x3 ∑xi = 1 with Maple we get: x1 = (p31p21 + p31p23 + p32p21) / Σ x2 = (p13p32 + p12p31 + p12p32) / Σ x3 = (p13p21 + p12p23 + p13p23) / Σ Σ = (p31p21 + p31p23 + p32p21 + p13p32 + p12p31 + p12p32 + p13p21 + p12p23 + p13p23)

  30. x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  31. x1 = (p31p21+ p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  32. x1 = (p31p21 + p23p31+ p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  33. x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  34. x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32+ p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23 + p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  35. x1 = (p31p21 + p23p31 + p32p21) / Σ x2 = (p13p32 + p31p12 + p12p32) / Σ x3 = (p21p13 + p12p23+ p13p23) / Σ p12 s1 s2 p22 p21 p11 p23 p13 p31 p32 s3 p33

  36. Symbolic Gaussian Elimination System with four variables: p21x2 + p31x3 + p41x4 = x1 p12x1 + p42x4 = x2 p13x1 = x3 p34x3= x4 ∑xi = 1

  37. with Maple we get: x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  38. x1 = p21p34p41 + p34p42p21 + p21p31p41+ p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  39. x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12+ p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  40. x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12+ p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  41. x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13+ p42p21p13 / Σ x4 = p21p13p34 / Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  42. Do you see the LIGHT? James Brown (The Blues Brothers)

  43. x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σ x2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σ x3 = p41p21p13 + p42p21p13 / Σ x4 = p21p13p34/ Σ p12 s1 s2 p21 p31 p13 p41 p42 p34 s3 s4 Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34

  44. Ergodic Theorem Revisited If there exists a reverse spanning tree in a graph of the Markov chain associated to a stochastic system, then: • the stochastic system admits the following probability vector as a solution: (b) the solution is unique. (c) the conditions {xi ≥ 0}i=1,n are redundant and the solution can be computed by Gaussian elimination.

  45. Cycle This system is now covered by the proof 2 1 3 6 4 5

  46. Markov Chain as a Conservation System p11x1 + p21x2 + p31x3 = x1(p11 + p12 + p13) p12x1 + p22x2 + p32x3 = x2(p21 + p22 + p23) p13x1 + p23x2 + p33x3 = x3(p31 + p32 + p33) p12 s1 s2 p22 p11 p21 p23 p13 p32 p31 s3 p33

  47. Kirchoff’s Current Law (1847) i12 • The sum of currents flowing towards a node is equal to the sum of currents flowing away from the node. 1 2 i21 i31 i32 i13 3 i31 + i21 = i12 + i13 i32 + i12 = i21 i13 = i31 + i32 i3 + i2 = i1 + i4

  48. Kirchhoff’s Matrix Tree Theorem (1847) The theorem allows us to calculate the number of spanning trees of a connected graph

  49. Two theorems for the price of one!! Differences Ergodic Theorem: The symbolic proof is simpler and more appropriate than using Perron-Frobenius Kirchhoff Theorem: Our version calculates the spanning trees and not their total sum.

  50. Internet Sites • Kleinberg • Google • SALSA • Indegree Heuristic

More Related