310 likes | 349 Views
WEB MINING by NINI P SURESH. PROJECT CO-ORDINATOR Kavitha Murugeshan. OUTLINE. Introduction Data mining Vs Web mining Web mining subtasks Challenges Taxonomy Web content mining Web structure mining Web usage mining Applications. INTRODUCTION.
E N D
WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan
OUTLINE • Introduction • Data mining Vs Web mining • Web mining subtasks • Challenges • Taxonomy • Web content mining • Web structure mining • Web usage mining • Applications
INTRODUCTION Nowadays, it has become necessary for users to utilise automated tools to find, extract, filter & evaluate desired information & resources. The target of search engines is only to discover the resources on the web.
INTRODUCTION • Needs for Web Mining • Narrowly searching scope • Low precision
INTRODUCTION • Other Approaches • Database approach (DB) • Information retrieval • Natural language processing (NLP) • Web document community
WEB MINING DEFENITION Web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the Web data.
DATA MINING WEB MINING Extracting relevant information hidden in Web-related data, like hypertext documents on web • Extraction of useful patterns from data sources like databases, texts, web, images etc
WEB MINING SUBTASKS • Resource finding • Information selection & preprocessing • Generalization • Analysis
CHALLENGES • Search relevant information on web • Create knowledge • Personalization of Information • Learn patterns • Uniformity & standardisation
CHALLENGES • Redundant Information • Noisy web • Monitoring changes • Sites providing Services • Privacy
TAXONOMY Web Mining Web Content Mining Web Structure Mining Web Usage Mining Web Text Mining Web Multimedia Mining Gen. Access Pattern Track Personalized Usages Track Link Mining Internal Structure Mining URL Mining
WEB CONTENT MINING • Discovering useful information & Analyses the content • Automatic process beyond keyword extraction • Approaches to restructure document content • Two groups of mining strategies
WEB CONTENT MINING • Agent based Approach • Intelligent search agents • Information filtering/categorization • Personalized web agents
WEB CONTENT MINING • Database Approach • Multilevel databases • Web query system
WEB STRUCTURE MINING • Discovering structure information from web • Web graph : web pages as nodes & hyperlinks as edges
WEB STRUCTURE MINING • Two algorithms for handling of links • PageRank • HITS
WEB STRUCTURE MINING • PageRank • Metric for ranking hypertext documents • Depends on rank of pages pointing it • Iterative process
WEB STRUCTURE MINING n : Number of nodes in graph Outdegree(q) : Number of hyperlinks on page q d : damping factor
WEB STRUCTURE MINING • HITS • Iterative algorithm • Identify topic hubs & authorities • Input : search results returned by traditional text indexing technique
WEB STRUCTURE MINING • Assigns weight to hub based on authoritiveness • Outputs pages with largest hub & authority weights
WEB USAGE MINING • Extracting information from server logs • Discover user access patterns of Web pages • Decomposed into 3 subtasks Site Files Preprocessing Mining algorithms Pattern Analysis Interesting Rules, Patterns & Statistic Rules, Patterns & Statistic User session file Raw logs
WEB USAGE MINING • Preprocessing • Data cleaning • User identification • User sessions identification • Access path supplement • Transaction identification
WEB USAGE MINING • Pattern discovery • Statistical Analysis • Association Rules • Clustering analysis
WEB USAGE MINING • Classification analysis • Sequential Pattern • Dependancy Modeling
WEB USAGE MINING • Pattern Analysis • Eliminates irrelevant rules or patterns • Extract intresting patterns
APPLICATIONS • Personalized Services • Improve website design • System Improvement • Predicting trends • Carry out intelligent buisness
PROS • High trade volumes • Classify threats & fight against Terrorism • Establish better customer relationship • Increase profitability
CONS • Invasion of Privacy • Discrimination by controversial attributes
CONCLUSION • Rapidly growing area • Promising area of future research
REFERENCE [1] http://en.wikipedia.org/wiki/Web mining [2] http://www.galeas.de/webimining.html [3] Jaideep srivastava, Robert Cooley, Mukund Deshpande, Pan-Ning Tan, Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, ACM SIGKDD,Jan 2000. [4] Miguel Gomes da Costa Jnior,Zhiguo Gong, Web Structure Mining: An Introduction, Proceedings of the 2005 IEEE International Conference on Information Acquisition [5] R. Cooley, B. Mobasher, and J. Srivastava,Web Mining: Information and Pattern Discovery on the World Wide Web, ICTAI97 [6] Brijendra Singh, Hemant Kumar Singh, WEB DATA MINING RE- SEARCH: A SURVEY, 2010 IEEE [7] Mining the Web: discovering knowledge from hypertext data, Part 2 By Soumen Chakrabarti, 2003 edition [8] Web mining: applications and techniques By Anthony Scime
WEB MINING Thank You