SPRINT: A Scalable Parallel Classifier for Data Mining

SPRINT: A Scalable Parallel Classifier for Data Mining Presenter : Yu-hui Huang Authors : John Shafer , Rakesh Agrawal Manish Mehta 國立雲林科技大學 National Yunlin University of Science and Technology VLDB 1996

Outline • Motivation • Objective • Methodology • Experiment • Conclusion

Motivation • Run time is expensive • must remain memory resident at all times. • Require large memory Data set

Objective • Construct a algorithm can to handle large datasets • Allowing many processors to work together

Methodology-SPRINT

Methodology-SPRINT 27.5 <--------------------------------------------------------------------------

Methodology-SPRINT

Methodology-SLIQ • SLIQ: • Parallelizing SLIQ: • SLIQ/R: the class list is replicated in the memory of every processor • SLIQ/D: Each.processor therefore contains only l/Nth of the class list.

Experiment

Conclusion • The SPRINT is no memory restrictions • Run time is very fast , compare with previous algorithm. 10

Comments • Advantage • … • Drawback • …. • Application • medical diagnosis , fraud detection, retail target marketing…

SPRINT: A Scalable Parallel Classifier for Data Mining

SPRINT: A Scalable Parallel Classifier for Data Mining

Presentation Transcript

CSE 634 Data Mining Concepts and Techniques Association Rule Mining

Data Mining: Preprocessing Techniques

Decision Tree Classification

Chapter 3: Data Mining and Data Visualization

Mining data with PolyAnalyst

Data Mining on Streams

DATA MINING LECTURE 4

Web Mining

CS490D: Introduction to Data Mining Prof. Walid Aref

What we have covered?

MMDSS 2007 Data stream management and mining

Mining text and data on chemicals

15-826: Multimedia Databases and Data Mining

CSE 634 Data Mining Concepts and Techniques Association Rule Mining

CSE 634 Data Mining Concepts and Techniques Association Rule Mining

Data Mining with Big Data

Spatial Data Mining

Data Mining: Concepts and Techniques