170 likes | 351 Views
A Fuzzy Web Surfer Model. Narayan L. Bhamidipati and Sankar K. Pal Indian Statistical Institute Kolkata. Contents. Web Surfer Models Markov Chains Fuzzy Markov Chains Fuzzy Web Surfer Models Advantages and Limitations. Web Surfer Models. Surfer visits pages randomly
E N D
A Fuzzy Web Surfer Model Narayan L. Bhamidipati and Sankar K. Pal Indian Statistical Institute Kolkata
Contents • Web Surfer Models • Markov Chains • Fuzzy Markov Chains • Fuzzy Web Surfer Models • Advantages and Limitations
Web Surfer Models • Surfer visits pages randomly • Various assumptions, various models • Random Surfer Model (PageRank) • HITS (goes back and forth) • Intelligent/Directed Surfer Model • PHITS, WPSS
Markov Chains • MC is aperiodic if each of its states is • MC is irreducible if it is possible to reach any state from any other state • Regular if aperiodic and irreducible • A regular MC has a unique stationary distribution
Surfer Models as Markov Chains • Web pages are states • Moving between pages are the transitions • Clicking or typing URLs • Transition probabilities • Steady state distribution yields the page ranks
Fuzzy Markov Chains • As opposed to classical Markov Chains (which are based on Probability Theory) • Fuzzy Transition Matrix • Employ max-min algebra (instead of the usual addition and multiplication)
Fuzzy Web Surfer Model • Uncertainty involved in following links • Uncertainty involved in existence of links • Uncertainty involved in page(let) boundaries • Modeled in a fuzzy sense
Fuzziness in Link’s Existence ? • “A link either exists or it does not exist” • Pagelets: web pages split into sections • Link to a page, but which section ? • What if the sections are not made explicit ? • Have to guess if a link is intended for a particular pagelet
Pagelets • Coherent parts of web pages • Pagelets may differ widely in terms of content • Need not necessarily be split into explicit sections • Identification by considering the structure of the documents
Model Formulation • Each pagelet is referred to as a page • Every link in the web graph has an associated fuzzy number denoting the possibility of it being followed • Obtain fuzzy transition matrix Q • (i, j) element of Q denotes the belief of moving to page i, when the surfer is on page j
Model Formulation • Usual PageRank link computations are performed using the Fuzzy Transition Matrix on the max-min algebra • Concept of FuzzRank
Favorable Properties • Able to model fuzziness in links • Also can capture fuzziness in page contents • Finite Convergence • Analysis in terms of fuzzy eigen sets • Robust Computation
To be Explored… • Conditions for ergodicity (and hence regularity) of fuzzy Markov chains are not completely known (as yet) • The steady state fuzzy distribution depends on the initial state • What do the different convergent fuzzy distributions correspond to ?
Conjecture • The distinct steady state fuzzy distributions correspond to web communities • Probably helpful in identifying communities from a given set of pages
References • K. Avrachenkov and E. Sanchez. Fuzzy markov chains and decision-making. Fuzzy Optimization and Decision Making, 1(2):143–159, June 2002. • J. J. Buckley and E.Eslami. Fuzzy markov chains: Uncertain probabilities. Mathware and Soft Computing, 9(1):33–41, 2002. • M. Diligenti, M. Gori, and M. Maggini. A unified probabilistic framework for web page scoring systems. IEEE Transactions on Knowledge and Data Engineering, 16(1):4–16, January 2004.