1 / 16

Projects

Projects. CS698V: Data Mining. Areas. Web Mining Bioinformatics Multimedia Mining Streaming Data Mining General Data Mining Methodologies. Project 1: Mining the Web Graph. Web as a graph: page is node, link is edge

alaura
Download Presentation

Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Projects CS698V: Data Mining

  2. Areas • Web Mining • Bioinformatics • Multimedia Mining • Streaming Data Mining • General Data Mining Methodologies

  3. Project 1: Mining the Web Graph • Web as a graph: page is node, link is edge • Task 1: Find the subgraph of the entire web consisting of the .ac.in domain • Task 1.1: Write crawlers to walk through the graph and remember paths crawled, eventually building the full graph structure • Task 1.2: Suggest efficient structures for representing the (sparse) graph • Generate statistics about the graph, approximate number of nodes (about 857,000 according to Google), edges, leafs

  4. Task 2: Mine the .ac.in web graph • Cluster the nodes based on link structure • Identify Hubs and Authorities • Report interesting patterns of the cluster structure Benefits: • Statistics of the domain not available, useful for optimization • Study of the evolution pattern (infancy, mature, saturated) • Identify likely “hidden web” in this domain • A search engine for the Indian academic and research network

  5. Further Reading • Publications of the Stanford WebBase Group • UbiCrawler, WebGraph, University of Milano, Italy • The Chilean Web

  6. Project 2: Metasearch • Combines of search results of several search engines • Combination strategy is open to research • Each search engines returns a set of pages ranked according to its relevance to the query • How to get a combined ranking (cranking) • Tasks: Comparative study of different personalized and adaptive combination schemes. Propose new scheme.

  7. Further Reading • Cranking using conditional probabilistic models, Lebanon, Lafferty, ICML 2002 • Rank Aggregation Methods for the Web, Dwork, Ravi Kumar et al, 2001 • Learning to Order things, Cohn, Schapire • Comparing top k lists, Fagin, Ravi Kumar, 2003

  8. Project 3: Intelligent Web Search Agents • WebMate: A web search agent/assistant which uses a proxy to record user browsing pattern, and recommends sites for future visits Task: • Study different agent architectures • Use association rule and other data mining technologies to design more intelligent web agents

  9. Further Reading • WebMate (CMU) • Calvin (U. Leipzig)

  10. Project 4: Hypertext/Text Categorization using Support Vector Machines • Task: Study of different SVM kernels for hypertext/text categorization for large collections • Task: Propose a new kernel which incorporates link information Further Reading: Composite kernels for hypertext categorization, Joachims, Christianinni, ICML 2001

  11. Project 5: Mining Microarray Data • Critical Assessment of Microarray Data Analysis (CAMDA) • Tasks: • Identifying genes responsible for a disease • Gene clustering/association mining • Gene regulatory networks Further Reading: • Papers in CAMDA contest data site

  12. Project 6: Mining Association Rules from Image Database • Perceptual Association Rules • Tesic, Newsam, Manjunath, SIAM data mining conf. 2003 • Image database: NASA Mars images, Corel image database • Task: Study different forms of generalized association rule that can be mined from images • Task: Innovative use of the rules in retrieval, event (e.g., cyclone) detection

  13. Project 7: Privacy Preserving Data Mining • Watermarking relational data – Agrawal, 2003 • Privacy preserving data mining – Agrawal 2000 • Task: Study of different frameworks for preserving privacy, propose new watermarking techniques

  14. Project 8: Mining for Alarming Incidents in Data Streams • Task: Study existing outlier detection algorithms • Task: Use algorithms for clustering data streams to detect outliers which are alarming incidents Further Reading: • MAIDS project, J. Han, UIUC • Clustering data streams, Mishra, Guha, Motwani, FOCS 2000

  15. Project 9: Data Mining Standards • Task: A detailed report on different data mining standards. • Models • Interfaces • Drawbacks • Scope for contribution Further Reading: • Microsoft OLE DB for DM • Oracle PMML • CRISP-DM

  16. Schedule • Projects and Groups formations to be finalized by Monday, Feb. 16, 2004 • Project Plan due by Feb. 23, 2004 • Midterm status check around March 23, 2004. • Final demonstration and documentation due by May 1, 2004.

More Related