1 / 26

Effective Prediction of Web-user Accesses: A Data Mining Approach

Effective Prediction of Web-user Accesses: A Data Mining Approach. Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki , Greece. Presentation: Spyros Papadimitriou, Carnegie Mellon Univ. Introduction (1/2).

ami
Download Presentation

Effective Prediction of Web-user Accesses: A Data Mining Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Prediction of Web-user Accesses: A Data Mining Approach Nanopoulos Alexandros Katsaros Dimitrios Yannis Manolopoulos Aristotle Univ. of Thessaloniki, Greece Presentation: Spyros Papadimitriou, Carnegie Mellon Univ.

  2. Introduction (1/2) • Web Prefetching: Deducing forthcoming user accesses based on log information • Focus on: • Predictive prefetching (use of history) • Server initiated (server makes predictions and piggybacks them to the clients)

  3. Introduction (2/2) • Within a site, users navigate following links [5] • For server-initiated predictive prefetching interest is for access patterns reflecting this behavior

  4. Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

  5. Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

  6. Requirements • Site structure and contents impose • The order of dependencies (first or higher) among the documents • The interleaving of documents belonging to patterns with random visits (noise) • Discovered patterns should respect these factors

  7. Related work • Dependency graph (DG) [9] • A graph maintains pairwise accesses • Prediction by Partial Match (PPM) [10] • A trie maintains sequences of consecutive accesses • LBOT [6] • Special form of association rules of length 2 • Others (variations of the above) [3,11]

  8. Motivation Noise (2nd Req.) Order (1st Req.) Proposed YesYes

  9. Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

  10. Proposed Method (1) • Novel Web log mining algorithm (WMo) • Apriori-like • Effective • Immune to noise • Considers high order dependencies • Efficient • Significant reduction in the number of candidates

  11. Proposed Method (2) • Session (or transaction): A sequence of requests that occur in a specified time interval from each other [2] • Containment relationship addresses the 1st requirement (avoiding noise) • Example: T = A, X, B, Y, C X, Y noise S = A, B, C the pattern S is contained by T • Comment:With contiguous subsequences based only on support S (the pattern) will be missed.

  12. Proposed Method (3) • Candidate generation respects the ordering of accesses in transactions. • Example: A,B B,A • Dramatic increase in the number of candidates • Exploits the site structure for pruning [7,8]

  13. Proposed Method (4) Algorithm genCandidates(Lk, G) //Lk the set of large k-paths and G the graph begin foreach L=l1, …, lk, L  Lk { N+(lk) = {v| arc lk v  G} foreach v  N+(lk) { //apply modified apriori pruning if v  L and L’ = l2, …, lk,v Lk { C= l1, …, lk , v if ( S  C, S  L’  S  Lk ) insert C in the candidate-trie } } } end

  14. Discussion • Sequential patterns [1] • Reduction when “customer-sequence” = “user-session” • Suffers from large number of candidates (by not considering the site structure) • Path Fragments [4] (containment relationship is performed with regular expressions and the “*” label ) • Focus on semantics (recommendation systems) • Prefetching: patterns are for system and not for human consumption • WMo focuses on efficiency/effectiveness rather on expressiveness (semantics)

  15. Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

  16. Methodology • Synthetic (sample site with 1000 nodes) • Synthetic data generator (see the paper) • Modeling site nodes, site linkage, size of documents • Real data sets (see the paper) • Examine the impact of: • noise • order • client cache (see the paper) • efficiency

  17. Accuracy w.r.t. noise

  18. Usefulness w.r.t. noise

  19. Traffic w.r.t. noise

  20. Accuracy w.r.t. order

  21. Usefulness w.r.t. order

  22. Traffic w.r.t. order

  23. Efficiency (see also [7,8])

  24. Presentation Outline • Motivation & Related work • Proposed method • Comparative performance evaluation • Conclusions

  25. Conclusions • Factors that influence Web Prefetching • Noise • Order • A new algorithm WMo was presented based on data mining • Compares favorably with previously proposed algorithms • WMo is an effective and efficient Web prefetching algorithm

  26. References • R.Agrawal, Ramakrishnan Srikant, Mining Sequential Patterns, ICDE 1995. • R.Cooley, B. Mobasher, J.Srivastava, Data Preparation for Mining World Wide Web Browsing Patterns, KAIS, 1(1), pp. 5-32, 1999. • M. Deshpande, G. Karypis, Selective Markov Models for Predicting Web-page Accesses, SIAM Data Mining, 2001. • W.Gaul, L.T.Schimdt-Thieme, Mining Web Navigation Path Fragments, WebKDD 2000. • B. A. Huberman, P. Pirolli, J. Pitkow and R. J. Lukose, Strong Regularities in World Wide Web Surfing. Science, 280, pp. 95-97, 1998. • B.Lan, S.Bressan, B.C. Ooi, Y.Tay, Making Web Servers Pushier, WebKDD 1999. • A. Nanopoulos, Y. Manolopoulos, Finding Generalized Path Patterns for Web Log Data Mining, ADBIS-DASFAA 2000. • A. Nanopoulos, Y. Manolopoulos, Mining patterns from graph traversals, DKE 37(3), pp.243-266, 2001. • V.Padmanabhan, J. Mogul, Using Predictive Prefetching to Improve World Wide Web Latency, ACM SIGCOMM Computer Communications Review, 26(3), 1996. • T.Palapans, A.Mendelzon, Web Prefetching Using Partial Match Prediction, WCW 1999. • J. Pitkow, P. Pirroli, Mining Longest Repeating Subsequences to Predict World Wide Web Surfing, USITS, 1999. • L.T.Schimdt-Thieme, W.Gaul, Recommender Systems Based on Navigation Path Features,WebKDD 2001.

More Related