1 / 13

DSCI 5240 Graduate Presentation Xxxxxx

DSCI 5240 Graduate Presentation Xxxxxx. Research paper: Web Mining Research: A survey SIGKDD Explorations , June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel. Outline. Introduction Web Mining Web Content Mining Web Structure Mining Web Usage Mining Conclusion.

rimona
Download Presentation

DSCI 5240 Graduate Presentation Xxxxxx

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DSCI 5240 Graduate PresentationXxxxxx Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1 Author: R. Kosala and H. Blockeel

  2. Outline • Introduction • Web Mining • Web Content Mining • Web Structure Mining • Web Usage Mining • Conclusion

  3. Introduction • The World Wide Web is a popular and interactive medium to disseminate information • Information users may encounter four problems 1. Finding relevant information a. low precision b. low recall 2. Creating new knowledge out of the information available on the web ---data-triggered process 3. Personalizing of the information People differ in the content and presentations of information 4. Learning about consumers or individual users Mass customizing or even personalizing

  4. Web Mining • Definition: web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the web data • Four subtasks • Resource finding: retrieving intended web documents • Information selection and pre-processing: selecting and pre-processing specific information • Generalization: discovering general patterns • Analysis: validation and/or interpretation of mined patterns

  5. Web Mining • Web Mining and Information Retrieval Definition: IR is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant documents as possible. goal: indexing and searching for useful documents • Web Mining and Information Extraction IE has the goal of transforming a collection of documents into information that is more readily digested and analyzed. • Compare IR and IE a. aims b. fields

  6. Web Mining • Web Mining and the Agent Paradigm Web mining is often viewed from or implemented within an agent paradigm • User interface agents • Distributed agents • Mobile agents Two approaches used to develop intelligent agents • Content-based approach • Collaborative approach

  7. Web Content Mining • Definition: discovering useful info from web page contents/data/documents • Several types of data: text, image, audio, video, hyperlinks • Types of Data Structure: 1.Unstructured: free text 2.Semi- structured: HTML 3.More structured: data in tables or database generated HTML pages

  8. Web Content Mining • IR view: Unstructured Documents • Bag of words to represent unstructured documents • Feature: Boolean, Frequency based • Variations of the feature selection • Features could be reduced using different feature selection techniques Semi-Structured Documents • Uses richer representations for features • Uses common data mining methods

  9. Web Content Mining • DB view: DB view tries to infer the structure of a web site or transform a web site to become a database Methods: • Finding the scheme of web documents • Building a web warehouse • Building a web knowledge base • Building a virtual database

  10. Web Structure Mining • Interested in the structure of the hyperlinks within the web • Inspired by the study of social networks and citation analysis Discover specific types of pages based on the incoming and outgoing links • Application: • discovering micro-communities in the web • measuring the completeness of a web site

  11. Web Usage Mining • Tries to predict user behavior from interaction with the web • Wide range of data • Two commonly used approaches • Maps the usage data of Web server into relational tables before an adapted data mining technique is performed • Uses the log data directly by utilizing special pre-processing techniques • problems: • Distinguishing among unique users, server sessions, episodes in the presence of caching and proxy servers • Often usage mining uses some background or domain knowledge • applications

  12. Conclusions • Survey of research in the area of web mining • Three web mining categories: content structure usage mining • Connection between web mining categories and related agent paradigm

More Related