150 likes | 290 Views
Web Mining. Ref: http://www.csse.monash.edu.au/courseware/cse5230/html/lectures.html. Lecture Outline. How big is the web? What is “web data”? A taxonomy of web mining tasks Example: targeted advertising Example: personalization References. How big is the web?.
E N D
Web Mining Ref: http://www.csse.monash.edu.au/courseware/cse5230/html/lectures.html
Lecture Outline • How big is the web? • What is “web data”? • A taxonomy of web mining tasks • Example: targeted advertising • Example: personalization • References
How big is the web? • It is not easy to determine the size of the web • In 1999, one estimate was that there were approximately 350 million web pages, growing at about 1 million pages per day • In 2001, Google announced that they were indexing around 3 billion web documents • In 2002 Google - Searching 3,083,324,652 web pages • No matter which of these is more accurate – it’s very big! • We can view the web as the world’s biggest database • The word “database” is used loosely here, because the web has no real formal structure or database schema • This makes the application of data mining to the web potentially very useful, but also difficult
What is “web data”? • Web data can be classified as follows [Dun2002]: • The actual content of web pages (text, images, multimedia) • Intrapage structure – the HTML or XML mark-up specifying the organization of the page content • Interpage structure – the links into and out of web pages • Usage data describing how the users of a web site access pages – navigation patterns • User profiles – these can include demographic data obtained from a registration process, or perhaps IP addresses. It can also include information found in cookies
A taxonomy of web mining tasks (1) • From [Dun2002], following [Zai1999]. Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Page Content Mining Search Result Mining General Access Pattern Tracking Customized Usage Tracking
A taxonomy of web mining tasks (2) • Web content mining • Examines the contents of web pages (text, graphics) • Examines the results of web searches • Mining systems built on top of existing search engines • Similar to traditional information retrieval (text categoriation, text filtering, etc.) • Often goes further than simple keyword search – e.g. may cluster similar pages • Web structure mining • Looks at page structure • e.g. text in <H1> tags may be more important • Links between pages • e.g. pages with many incoming links may be more useful
A taxonomy of web mining tasks (3) • Web usage mining • Looks at log files of web access • General access tracking looks at history of pages visited • Customised usage tracking may be focused on particular kinds of usage, or particular users • Involves mining of sequential patterns • Can use association rule discovery • These patterns can be clustered to reveal users with similar access behaviour • Can be used to • improve web site design • Customize presentation via collaborative filtering
Example: targeted advertising (1) • In marketing, targeting is any technique used to direct marketing or advertising effort to the portion of the population thought to be most valuable to the business, e.g. those • Likely to purchase • Likely to spend a lot • The business wants to avoid spending money on sending advertising to people who will not respond to it • In the web context, this can mean displaying an add for a web site on a different web site • Can use web usage information to work out what kind of people use a site: target demographics • Sell advertising to companies wanting to target that demographic
Example: targeted advertising (2) • For example, the Rugby Heaven web site (http://rugbyheaven.smh.com.au/) is hosting advertising for: • MLC life insurance • Fintrack Financial Services • Business Review Weekly (BRW) • They appear to think that this site is likely to be popular with older people who have money! • The URL for the BRW ad. is:http://campaigns.f2.com.au/event.ng/Type=click&FlightID=10928&AdID=24947&TargetID=2389&Segments=2,13,23,31,35,77,81,88,93,94,153,855,976,993,1145,1301,1989,2320,2389,2394,2396,2477,2534,2576,2581,2689&Targets=535,2389,40,60,1834&Values=25,31,43,48,50,60,72,81,91,100,110,135,150,157,233,239,366,422,605,791,804,805,806,1203,1278,1403,1432,1476,1485,1499&RawValues=&Redirect=http:%2F%2Fwww.brw.com.au%2Fsubscription%2Fsubscribe.asp • It is clear that some sophisticated targeting is going on
Example: personalization (1) • Personalization spans the areas of web content mining and web usage mining • Personalization aims to modify document contents or access patterns to better match the preferences of a particular user • Personalization can involve • Dynamically creating and serving web pages that are unique to an individual user • Determining which pages to retrieve or link to on a user-by-user basis
Example: personalization (2) • Unlike targeting, with personalization can be done for the target web page (unlike a targeted advertisement for another site) • Simple example: including the name of the user in the page content • Personalization techniques include • Use of cookies • Use of user databases • Use of web usage patterns to identify similar users (for use in collaborative filtering) • Often requires a user to log in – this part is not data mining
Example: personalization (3) • A classic example of personalization is the recommending to a user of • a product very similar to something they have bought before (if the web site is selling something) • Content that is similar to something they have used before • Personalization techniques can be based on clustering, classification or even prediction • With classification, the desires of a user are determined based on the class to which he/she is assigned. Classes may be predetermined by experts. • With clustering, clusters of users with similar navigation or purchasing behaviour are found, and the user’s desires are determined on this basis
Example: personalization (4) • Amazon.com makes use of personalization. • They make use of the user’s past behaviour • They also use collaborative filtering – they recommend products bought by users who have similar profiles to the current user • Could use clustering, or information filtering techniques
References • [Dun2002] Margaret H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, Upper Saddle River, NJ, USA, 2002, pp. 195-220. • [Zai1999] Osmar R. Zaïane, Resource and Knowledge Discovery from the Internet and Multimedia Repositories, PhD Thesis, Simon Fraser University, Canada, March 1999.