240 likes | 444 Views
Web Usage Mining: An Overview. Lin Lin Department of Management Lehigh University Jan. 30 th. Agenda. Web Usage Mining: Definition Research Issues in Web Usage Mining Current Research in Web Usage Mining Going Forward. Web Usage Mining: A Definition.
E N D
Web Usage Mining: An Overview Lin Lin Department of Management Lehigh University Jan. 30th
Agenda • Web Usage Mining: Definition • Research Issues in Web Usage Mining • Current Research in Web Usage Mining • Going Forward
Web Usage Mining: A Definition • The process of applying data mining techniques to the discovery of usage patterns from Web data, targeted towards various applications • Different from content mining & structure mining (Adamic, L. A., and Adar, E. 2003. Friends and neighbors on the web. Social Networks 25(3):211–230.)
Web Usage Mining: Data Source • Typical data sources for web usage mining are: • Web structure data (site map, links, etc.) • Web content data • User profile (may not be available) • Web log (web usage data, clickstream data)
Preprocessing: Challenges • WHO are the users? • IP vs. real people • HOW LONG did the users stay? • Measuring session time (L. Catledge and J. Pitkow. Characterizing browsing behaviors on the world wide web. Computer Networks and ISDN Systems, 27(6), 1995)(Berendt, B. Mobasher, M. Nakagawa, and M. Spiliopoulou. The impact of site structure and user environment on session reconstruction in web usage analysis. In Proceedings of the 4th WebKDD 2002 Workshop, at the ACM-SIGKDD Conference on Knowledge Discovery in Databases (KDD’2002), Edmonton, Alberta, Canada, July 2002. • WHERE did the users go? • Server side vs. Client side • WHAT did the users view? • Content processingMoe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational click-stream. J. Consumer Psych. 13(1, 2) 29–40. --------------------------------------------------------------------------------------- For the best review on preprocessing methods, refer to: R. Cooley, B. Mobasher, J. Srivastava, Data preparation for mining world wide web browsing patterns, Knowledge and Information Systems 1 (1) (1999) 5–32
Usage Pattern Discovery: Methods • Statistical Methods (including dependency modeling and stochastic modeling) • Association Rule Mining • Clustering (user cluster vs. page cluster) • Classification
Usage Pattern Discovery: Research Streams • Why am I interested in web usage mining? (a.k.a., what’s the big deal?) • Blattberg, Robert C. and John Deighton (1991), "Interactive Marketing: Exploring the Age of Addressability," Sloane Management Review, 33 [1), 5-14 • Ghosh, S. 1998. Making business sense of the Internet. Harvard Business Review 76(2) 126–135 • Bucklin R. E., Lattin, J. M., Ansari, A., Bell, D., Coupey, E. Gupta, S., Little, J. D. C., Mela, C. Montgomery, A. Steckel, J. Choice and the Internet: From Clickstream To Research Stream. Marketing Letters, 2002,Vol. 13, No. 3, pp. 245-258
Usage Pattern Discovery: Research Streams • Lin’s two cents on current research streams • Build a better site: • For everybody – system improvement (caching & web design) • For individuals – personalization • For search engines – SEO • Know your visitors better: • Customer behavior • Be a better business
Build a Better Site: System Improvement • Server-side caching of web pages (association rules) • Y.-H. Wu, A.L.P. Chen, Prediction of web page accesses by proxy server log, World Wide Web 5 (1) (2002) 67–88 • Preprocessing: No IP discussion, sessions split by time-based heuristics • Method: Sequential pattern mining • Data: Usage • Contribution: Use frequent sequence to predict candidate page, “personalize” based on user maturity
Build a Better Site: System Improvement • Improvement of general web design (AR, SP, MM) • Fang, X. and O. R. L. Sheng (2004). Link Selector: A web mining approach to hyperlink selection for web portals. ACM Transactions on Internet Technology 4, 209–237 • Preprocessing: No IP distinguished, sessions split by 25.5 minutes • Method: Association mining • Data: Usage & Structure • Contribution: Combine structure info. and usage info. to optimize portal page design • Where are we headed: adaptive web design • Y. Fu, M. Creado, C. Ju, Reorganizing web sites based on user access patterns, in: Proceedings of the Tenth International Conference on Information and Knowledge Management, ACM Press, 2001, pp. 583–585 (usage & content)
Build a Better Site: Personalization • Personalize the web site based on usage patterns (AR, Clustering) • A key research domain: recommender systems* • Content clustering vs. users clustering vs. hybrid approach • C. Shahabi and F. Banaei-Kashani. Ecient and anonymous web usage mining for web personalization. INFORMS Journal on Computing, Special Issue on Data Mining, 2002 • Method: Clustering of sessions • Data: Client side usage data • Where are we headed: incorporate time and web 2.0 • *: Refer to Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749 for a good review on recommender systems
Build a Better Site: SEO • Adding usage information into PageRank • Kalyan Beemanapalli, Ramya Rangarajan, Jaideep Srivastava, “Usage-Aware Average Clicks”, In Proc. Of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006), August 20-23 2006 • Method: Association rule in spirit
Know your visitors better:Customer behavior • A favorite research stream by marketers and MIS researchers • Statistical models are used most of the time • “Macro-level” behavior is often the focus • Interesting questions related to firm performance and profitability
Know your visitors better:Customer behavior • Johnson, E. J., Wendy Moe, Peter S. Fader, Steven Bellman, and Jerry Lohse. "On the Depth and Dynamics of Online Search Behavior," Management Science, Vol. 50, No. 3, March 2004, pp. 299–308 • model an individual’s tendency to search as a logarithmic process • hierarchical Bayesian model with Depth of Search , dynamics of search and activity of search • interested in the number of unique sites searched by each household within a given product category • Preprocessing: Households identified by client-side programs, session is month-based • Method: Statistical Modeling (log model) • Data: Usage (search)
Know your visitors better:Customer behavior • Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. J. Consumer Psych. 13(1, 2) 29–40 • WHY do the customers visit? • Preprocessing: Content Processing • Method: Clustering of sessions by visiting behavior parameters and content parameters • Data: Usage & Content • Conclusion:
Know your visitors better:Customer behavior • Moe, Wendy W. 2003. Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. J. Consumer Psych. 13(1, 2) 29–40
Know your visitors better:Customer behavior • Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Commerce Web Site: A Task Completing Approach. Journal of Marketing Research. 41 (3), 306-323 • How do the customers visit? • Predicts online buying by linking the purchase decision to what visitors do and to what they are exposed while at the site. • Preprocessing: Content Processing • Method: Statistical Modeling • Data: Usage & Content • Conclusion:
Know your visitors better:Customer behavior • Sismeiro, Catarina, Randolph E. Bucklin. 2004. Modeling Purchase Behavior at an E-Commerce Web Site: A Task Completing Approach. Journal of Marketing Research. 41 (3), 306-323 • browsing behavior (i.e., time and page views) • repeat visitation to the site (return and total number of sessions) • use of interactive decision aids • Data input effort and information gathering and processing • a series of page specific characteristics
Know your visitors better:Customer behavior • My Research: Online Customer Lifetime • predict an individual’s tendency to stay with an e-tailer • Hybrid BG/NBD model + Neural Networks • interested in the relationship between online customer lifetime and firm profitability • Preprocessing: Households identified by client-side programs, session is month-based • Method: Statistical Modeling & Classification • Data: Usage
Know your visitors better:Customer behavior • My Research: Online Customer Lifetime • Given N customers with visiting history (Xi, txi, T ) • T :the observed time period • Xi : number of visits customer i made during T • txi:time of the last visit made by customer i • Find the best fit for the following maximum likelihood equation to estimate the four parameters r, a, b and
Know your visitors better:Customer behavior • Given r, a, b and , we can predict: • Total number of visits during a time period t (starting from time 0) • Number of visits an individual will make in the future t time units Y(t)(from T+1 to T+t)
Know your visitors better:Customer behavior • My Research: Online Customer Lifetime