1 / 7

Will Data Mining Change the Functions of DBMS?

This article explores the integration of data mining into essential functions of DBMS, such as indexing, data cleaning, data integration, and query processing. It discusses the potential benefits of incorporating data mining techniques into DBMS and examines various applications, including indexing graphs, cleaning messy data, integrating multiple data relations, and refining query plans.

jleona
Download Presentation

Will Data Mining Change the Functions of DBMS?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Will Data Mining Change the Functions of DBMS? Jiawei Han DAIS (Data And Information Systems) Lab University of Illinois at Urbana-Champaign

  2. Will DM Be Integrated with DB Functions? • DM: Already a functional component of DBMS • Microsoft/SQLServer: Analysis Manager • IBM/DB2 & IntelligentMiner • Oracle: Data Mining Package • But will DM be “intruding” into DBMS, i.e., be integrated with essential DBMS functions? • Indexing • Data integration • Data cleaning • Query processing

  3. Indexing by Data Mining • Indexing graphs? ─ # of subgraphs: exponential! • Chemical Informatics/bioinformatics … • Discriminative frequent graph patterns (SIGMOD’04) • Indexing subsequences? • Shopping sequence, DNA/protein sequence (SDM’05) • When is discriminative frequent pattern indexing useful? • Complex objects, big (object) queries Sample database (a) (b) (c) Query graph

  4. Data Cleaning by Data Mining • Load messy data into a structured database? • Inconsistent data: age = “1946”? • Field mis-alignments • Glitches of data: completely messed up inputs • Missing/un-matching delimiters: XML, HTML data • Big field: BLOB, CLOB, multimedia and text • Data mining • Data cleaning by distribution/outlier analysis • Dependency/correlation analysis • Schema-directed or schema “discovery”

  5. Data Integration by Data Mining • Linking and mining cross-over multiple data relations • Cross-mine (Classification across multiple data relations: ICDE’04) • Search across heterogeneous databases • Object identification/merge, reference reconciliation (Alon’s group) • Mining across heterogeneous DBs • Personalizing data from heterogeneous sources

  6. Query Processing by Data Mining • Query plan refinement based on query execution history • Better query planning by investigating additional data statistics • Current optimizer: key/foreign key, cardinality, # distinct values • Additional information: • Strong dependency/correlation • Histogram, dense vs. sparse regions, etc.

  7. Conclusions • DBers have been “invading” into DM and made great contributions • It is time to consider that DM may invade DBMS to enhance its functionality • General philosophy • Invisible data mining • Google is doing this for page ranking successfully • Can we do it to enhance DBMS? • You can do better if you know your data better!

More Related