260 likes | 391 Views
Effective Prediction of Web-user Accesses: A Data Mining Approach. Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki , Greece. Presentation: Spyros Papadimitriou, Carnegie Mellon Univ. Introduction (1/2).
E N D
Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.
Introduction (1/2) • Web Prefetching: Deducing forthcoming user accesses based on log information • Focus on: • Predictive prefetching (use of history) • Server initiated (server makes predictions and piggybacks them to the clients)
Introduction (2/2) • Within a site, users navigate following links [5] • For server-initiated predictive prefetching interest is for access patterns reflecting this behavior
Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions
Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions
Requirements • Site structure and contents impose • The order of dependencies (first or higher) among the documents • The interleaving of documents belonging to patterns with random visits (noise) • Discovered patterns should respect these factors
Related work • Dependency graph (DG) [9] • A graph maintains pairwise accesses • Prediction by Partial Match (PPM) [10] • A trie maintains sequences of consecutive accesses • LBOT [6] • Special form of association rules of length 2 • Others (variations of the above) [3,11]
Motivation Noise (2nd Req.) Order (1st Req.) Proposed YesYes
Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions
Proposed Method (1) • Novel Web log mining algorithm (WMo) • Apriori-like • Effective • Immune to noise • Considers high order dependencies • Efficient • Significant reduction in the number of candidates
Proposed Method (2) • Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2] • Containment relationship addresses the 1st requirement (avoiding noise) • Example: T = A, X, B, Y, C X, Y noise S = A, B, C the pattern S is contained by T • Comment:With contiguous subsequences based only on support S (the pattern) will be missed.
Proposed Method (3) • Candidate generation respects the ordering of accesses in transactions. • Example: A,B B,A • Dramatic increase in the number of candidates • Exploits the site structure for pruning [7,8]
Proposed Method (4) Algorithm genCandidates(Lk, G) //Lk the set of large k-paths and G the graph begin foreach L=l1, …, lk, L Lk { N+(lk) = {v| arc lk v G} foreach v N+(lk) { //apply modified apriori pruning if v L and L’ = l2, …, lk,v Lk { C= l1, …, lk , v if ( S C, S L’ S Lk ) insert C in the candidate-trie } } } end
Discussion • Sequential patterns [1] • Reduction when “customer-sequence” = “user-session” • Suffers from large number of candidates (by not considering the site structure) • Path Fragments [4] (containment relationship is performed with regular expressions and the “*” label ) • Focus on semantics (recommendation systems) • Prefetching: patterns are for system and not for human consumption • WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)
Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions
Methodology • Synthetic (sample site with 1000 nodes) • Synthetic data generator (see the paper) • Modeling site nodes, site linkage, size of documents • Real data sets (see the paper) • Examine the impact of: • noise • order • client cache (see the paper) • efficiency
Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions
Conclusions • Factors that influence Web Prefetching • Noise • Order • A new algorithm WMo was presented based on data mining • Compares favorably with previously proposed algorithms • WMo is an effective and efficient Web prefetching algorithm
References • R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995. • R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999. • M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses, SIAM Data Mining, 2001. • W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000. • B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World Wide Web Surfing. Science, 280, pp. 95-97, 1998. • B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999. • A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining, ADBIS-DASFAA 2000. • A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3), pp.243-266, 2001. • V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996. • T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW 1999. • J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing, USITS, 1999. • L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path Features,WebKDD 2001.