70 likes | 76 Views
This panel talk explores the current state of data mining technology, its challenges and opportunities, and the research issues that will shape its transition to the mainstream. Topics include data scrubbing, visualization, understanding, and new opportunities such as text mining, time series analysis, and domain-specific applications.
E N D
Mainlining Data Mining: Jim Gray Microsoft Panel talk at ICDE2000 San Diego, 2 Mar 2000
Is data mining still a niche technology? • 97,363 items on Northern Light re “data mining” • 9,075,288 items re “data base” or “database” • Is 100,000 items a niche? (OR: 14K, XML: 250K) • Today data mining tools for experts (statisticians). (Decision Trees, Clusters, K-means, Neural nets…) • High tech and High Touch aka: consulting and license fees And the vendors like it that way. • Claim that you MUST understand the technology to use it.
But.. The Petabytes are Coming!! • We will be/are drowning in data/email/web.. • Abstraction & categorization are key technologies • But, • They have to work. • They have to be trivial to learn. • Successful Ubiquitous data mining (clustering/classifiers…) • Mail Filters/Classifiers • Resume readers • Shopping recommendations, Community finders • Web search engines
Key technical/research issues for transition to the mainstream? PROCESS PROBLEMS: • Getting data into tool is hell • Scrubbing data is hell • Then comes the easy part: mining • Then comes the really hard part: visualization and understanding • Most of us: • Can’t understand neural nets (that’s bad). • Can’t understand statistics (that’s a fact).
Key technical/research issues for transition to the mainstream? Opportunities: It’s not just numbers • Text mining • Time series • Domain specific • Web logs • Protein patterns • Spatial (e.g. geology, astronomy) • Image
1990 FORD 1991 CHEVY 1992 1993 By Year By Make By Make & Year RED WHITE BLUE By Color & Year By Make & Color Sum By Color New opportunities for KDM? • Make data capture/scrub/import trivial • Provide intuitive manipulation interfaces • Provide simpler analysis concepts support/confidence concept precision/recall ranking pivot & rollup & cube • Provide interactivevisual data explorer. • Case in point: I have yet to see a nice data cube visualizer.
Research challenges that will impact data mining? • Simpler analysis concepts • Visualization tools to navigate data • Better algorithms = Better answers