110 likes | 338 Views
Introduction to Web Mining. Spring 2013. What is data mining?. Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web, images, etc. Patterns must be: valid, novel, potentially useful, understandable. Classic data mining tasks. Classification:
E N D
Introduction to Web Mining Spring 2013
What is data mining? • Data mining is • extraction of useful patterns from data sources, e.g., databases, texts, web, images, etc. • Patterns must be: • valid, novel, potentially useful, understandable
Classic data mining tasks • Classification: mining patterns that can classify future (new) data into known classes. • Association rule mining mining any rule of the form X Y, where X and Y are sets of data items. • Clustering identifying a set of similarity groups in the data
Classic data mining tasks (contd) • Sequential pattern mining: A sequential rule: A B, says that event A will be immediately followed by event B with a certain confidence • Deviation detection: discovering the most significant changes in data • Data visualization CS583, Bing Liu, UIC
Why is data mining important? • Huge amount of data • How to make best use of data? • Knowledge discovered from data can be used for competitive advantage. • Many interesting things that one wants to find cannot be found using database queries, e.g., “find people likely to buy my products”
WWW • Web is an internet-based computer network that allows users of one computer to access information stored on another through the internet. • Client-server model, hypertext documents • Invented in 1989 by Tim Berners-Lee at CERN with HTTP/HTML • Mosaic (1993), Netscape(1994), Internet Explore (1995) • Related with Internet (ARPANET, TCP/IP)
Web mining • traditional data mining • data is structured and relational • well-defined tables, columns, rows, keys, and constraints. • Web data • readily available data rich in features and patterns • Content/link/usage data
Topic Description • Introduction to basic data mining: association and sequential mining, classification, clustering • Crawling, Web search and information retrieval • Social network analysis • Structure data extraction, information integration • Opinion mining and sentiment analysis • Web usage mining
Related fields • Web mining is an multi-disciplinary field: Machine learning Statistics Databases Information retrieval Visualization Natural language processing etc.