Effective Prediction of Web-user Accesses: A Data Mining Approach

Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.

Introduction (1/2) • Web Prefetching: Deducing forthcoming user accesses based on log information • Focus on: • Predictive prefetching (use of history) • Server initiated (server makes predictions and piggybacks them to the clients)

Introduction (2/2) • Within a site, users navigate following links [5] • For server-initiated predictive prefetching interest is for access patterns reflecting this behavior

Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

Requirements • Site structure and contents impose • The order of dependencies (first or higher) among the documents • The interleaving of documents belonging to patterns with random visits (noise) • Discovered patterns should respect these factors

Related work • Dependency graph (DG) [9] • A graph maintains pairwise accesses • Prediction by Partial Match (PPM) [10] • A trie maintains sequences of consecutive accesses • LBOT [6] • Special form of association rules of length 2 • Others (variations of the above) [3,11]

Motivation Noise (2nd Req.) Order (1st Req.) Proposed YesYes

Proposed Method (1) • Novel Web log mining algorithm (WMo) • Apriori-like • Effective • Immune to noise • Considers high order dependencies • Efficient • Significant reduction in the number of candidates

Proposed Method (2) • Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2] • Containment relationship addresses the 1st requirement (avoiding noise) • Example: T = A, X, B, Y, C X, Y noise S = A, B, C the pattern S is contained by T • Comment:With contiguous subsequences based only on support S (the pattern) will be missed.

Proposed Method (3) • Candidate generation respects the ordering of accesses in transactions. • Example: A,B B,A • Dramatic increase in the number of candidates • Exploits the site structure for pruning [7,8]

Proposed Method (4) Algorithm genCandidates(Lk, G) //Lk the set of large k-paths and G the graph begin foreach L=l1, …, lk, L  Lk { N+(lk) = {v| arc lk v  G} foreach v  N+(lk) { //apply modified apriori pruning if v  L and L’ = l2, …, lk,v Lk { C= l1, …, lk , v if ( S  C, S  L’  S  Lk ) insert C in the candidate-trie } } } end

Discussion • Sequential patterns [1] • Reduction when “customer-sequence” = “user-session” • Suffers from large number of candidates (by not considering the site structure) • Path Fragments [4] (containment relationship is performed with regular expressions and the “*” label ) • Focus on semantics (recommendation systems) • Prefetching: patterns are for system and not for human consumption • WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)

Methodology • Synthetic (sample site with 1000 nodes) • Synthetic data generator (see the paper) • Modeling site nodes, site linkage, size of documents • Real data sets (see the paper) • Examine the impact of: • noise • order • client cache (see the paper) • efficiency

Accuracy w.r.t. noise

Usefulness w.r.t. noise

Traffic w.r.t. noise

Accuracy w.r.t. order

Usefulness w.r.t. order

Traffic w.r.t. order

Efficiency (see also [7,8])

Conclusions • Factors that influence Web Prefetching • Noise • Order • A new algorithm WMo was presented based on data mining • Compares favorably with previously proposed algorithms • WMo is an effective and efficient Web prefetching algorithm

References • R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995. • R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999. • M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses, SIAM Data Mining, 2001. • W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000. • B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World Wide Web Surfing. Science, 280, pp. 95-97, 1998. • B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999. • A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining, ADBIS-DASFAA 2000. • A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3), pp.243-266, 2001. • V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996. • T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW 1999. • J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing, USITS, 1999. • L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path Features,WebKDD 2001.

Effective Prediction of Web-user Accesses: A Data Mining Approach

Effective Prediction of Web-user Accesses: A Data Mining Approach

Presentation Transcript

Data Mining with Clementine

Frequent Item Mining

CS490D: Introduction to Data Mining Prof. Walid Aref

Drug Safety Assessment and Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Advanced Topics in Data Mining: Association Rules

DATA WAREHOUSING AND DATA MINING

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

CSE 538 Web Search and Mining Web Crawling

Approximate Mining of Consensus Sequential Patterns

DATA MINING LECTURE 5

Data Mining Tools

Leonid Perlovsky Visiting Scholar, Harvard University Technical Advisor, AFRL

CS490D: Introduction to Data Mining Prof. Chris Clifton

Data Mining : Implementations

Data Mining with DB

Data Mining using Fractals and Power laws

Data Mining with CANape 9.0