210 likes | 366 Views
Density Biased Sampling An improved method for clustering. By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December 2013. Table of Contents. Abstract Introduction Density Biased Sampling Related Works Approximating Density Biased Sampling Experiments
E N D
Density BiasedSamplingAn improved method for clustering By: Mesbah, Seyedsadra Department of computer Science, Lakehead University December2013
Table of Contents • Abstract • Introduction • Density Biased Sampling • Related Works • Approximating Density Biased Sampling • Experiments • Methodology • Evaluation Metrics • Data Generation • Results • Conclusion • References
Abstract • purpose • Problem with Uniform Random Sampling • Under Sample / Over Sample • Weighted Sample • Memory Efficient
Introduction • Uniform Sampling / No Value Consideration • Sets of Equivalent Records • Clustering in General • Reduce the Data Size • P-Uniform • Example • Density Biased Sample / Weighted Sample
Density Biased Sampling • Basic Definition • Constraints • Uniform Selection • Density Preserving Sample • Biased by Group Size / Sample Size M • Observations
Related Works • Some of Related Works • BIRCH Algorithm • Uniform Sampling vs. CF-Tree • DBS vs. PPS
Approximating DBS • Need to be Partitioned • Lack of Memory Problem • Two Pass algorithm • Sample of First j Items • Convert to One Pass Algorithm
Experiments • Aim • Conditions
Methodology • Experiment Specifications • BIRCH Summarization • Uniform Random Sampling • Hash Based Approximation • Exact Density Biased Sampling
Evaluation Metrics • RMS • RMS Error • Number of Clusters Found (NC)
Data Generation • Based on Mixture Model • Discard Noises • Cluster Membership Distributions • Example
Results (1) • BIRCH performs quite poorly
Results (2) • IBS and IRBS Find More Clusters
Results (3) • In Average Case, IBS and IRBS Are Better
Results (4) • Binning is ideal for IRBS
Results (5) • Collisions Have no Effect on Clustering
Applications • Improve Summarizations • Statistical Models
Conclusion • General Summary • Hash Based Approximation • Appropriate Binning • Problem with Uniform Sampling • Using Zipf Distribution
References 1. Christopher R. Palmer , Christos Faloutsos"Density Biased Sampling: An Improved Method for Data Mining and Clustering" 2. International Journal of Computer Science and Management Research Vol 1 Issue 1 Aug 2012ISSN 2278-733X A.K.Jainet.al. 72"Survey of Recent Clustering Techniques in Data Mining"