510 likes | 1.08k Views
An Introduction to Web Mining. Categories of Web Mining. Web Content Mining Text Multimedia Web Structure Mining Web Usage Mining Reference R. Kosala and H. Blockeel, “Web Mining Research: A Survey”, SIGKDD Exploration, vol. 2, issue 1, 2000.
E N D
An Introduction to Web Mining
Categories of Web Mining • Web Content Mining • Text • Multimedia • Web Structure Mining • Web Usage Mining Reference • R. Kosala and H. Blockeel, “Web Mining Research: A Survey”, SIGKDD Exploration, vol. 2, issue 1, 2000. • J. Srivastava et al, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data”, SIGKDD Exploration, vol. 2, issue 1, 1999.
How Does It Work -Web Usage Mining Process Web Server Log Data Preparation Data Mining Clean Data Site Data Usage Patterns
Web Usage Mining Techniques • Data Preparation • Data Collection • Data Selection • Data Cleaning • Data Mining • Navigation Patterns • Association Rules • Sequential Patterns • Clustering • Classification
Data Mining Techniques –Navigation Patterns A E B C D Web Page Hierarchy of a Web Site
A E B C D Data Mining Techniques –Navigation Patterns A link could be provided from C to E
Data Mining Techniques –Navigation Patterns (cont.) • Analysis Examples • 70% of users who accessed /company/product2 did so by starting at /company and proceeding through /company/new, /company/products and company/product1 • 80% of users who accessed the site started from /company/products • 65% of users left the site after four or less page references
Data Mining Techniques - Association Rules • Supermarket example Transaction ID Items Purchased 1 butter, bread, milk, beer, diaper 2 bread, milk, beer, egg 3 Coke, Film, bread, butter, milk … ……… • An association rule will be like “If a customer buys diapers, in 60% of cases, he/she also buys beers. This happens in 3% of all transactions. 60%: confidence3%: support
Data Mining Techniques - Association Rules (cont.) • Web usage example • 40% of users who accessed the Web page with URL /company/product1, also accessed /company/product2 • 30% of users who accessed /company/special, placed an online order in /company/product1 • 50% of users who bought the books by Michael Crichton also reviewed those by John Grisham in the same visit
Data Mining Techniques – Sequential Patterns • Supermarket example Customer Transaction Time Purchased Items John 6/21/97 5:30 pm Beer John 6/22/97 10:20 pm Brandy Frank 6/20/97 10:15 am Juice, Coke Frank 6/20/97 11:50 am Beer Frank 6/21/97 9:25 am Wine, Water, CIder Mitchell 6/21/97 3:20 pm Beer, Gin, Cider Mary 6/20/97 2:30 pm Beer Mary 6/21/97 6:17 pm Wine, Cider Mary 6/22/97 5:05 pm Brandy Robin 6/20/97 11:05 pm Brandy
Data Mining Techniques – Sequential Patterns (cont.) • Supermarket example Customer Sequence Customer Customer Sequences John (Beer) (Brandy) Frank (Juice, Coke) (Beer) (Wine, Water, Cider) Mitchell (Beer, Gin, Cider) Mary (Beer) (Wine, Cider) (Brandy) Robin (Brandy)
Data Mining Techniques – Sequential Patterns (cont.) • Supermarket example Mining Result Sequential Patterns with Supporting Support >= 40% Customers (Beer) (Brandy) John, Frank (Beer) (Wine, Cider) Frank, Mary
Data Mining Techniques – Sequential Patterns (cont.) • Web usage examples • 30% of users who visited /company/products had done a search in Yahoo, within the past week on keyword w • 60% of users who placed an online order in /company/product1 also placed an order in /company/product4 within 15 days
Data Mining Techniques – Clustering Customer Profile dynamic static
Data Mining Techniques – Clustering (cont.) 100 cluster 1 cluster 2 A g e cluster 3 Income $150,000
Data Mining Techniques – Classification • Deployed methods • Decision Trees • Example
Data Mining Techniques – Classification (cont.) • Example 1 Decision Tree Income =High Income =low D1 D2 D1 D2
Data Mining Techniques – Classification (cont.) • Example 2 1 Decision Tree Income =High Income =low D1a D2 D1b D1 D2 D1a D1b
Data Mining Techniques – Classification • Web usage examples • 50% of users who placed an online order in /company/product2, were in the 20-25 age group and lived on the West Coast • If an user put more than 2 items in the shopping cart, he/she will place an order during that visit to the site
Challenges • Large data size • Sampling vs. Accuracy • Data complicatedness • Need hybrid data mining methods • Filtering of mining results • Incremental mining • On-line (real-time) mining
Applications: Personalized Web Services Response & Recommendation Business System Entry System Real-Time Response System Ware House users Data Preparation Data Mining Rules Executer Clean Warehouse Business Rules