90 likes | 118 Views
This presenter discusses a new scalable k-determination scheme for applying the powerful yet computationally expensive MOCK algorithm to web data clustering. The objective is to determine the appropriate k at a low cost, with the methodology involving the original MOCK and a new scheme. Experiments show that the new scheme can determine k at a lower cost, reducing Pareto size by 50-70% without needing random data clustering. Its advantage lies in the ability to apply MOCK to large-scale data, though performance may be slightly poorer than the original algorithm.
E N D
Multiobjective Clustering with Automatic k-determination for Large-scale Data Presenter : Shao-Wei Cheng Authors : Nobukazu Matake, Tomoyuki Hiroyasu, Mitsunori Miki, Tomoharu Senda CECCO 2007
Outline • Motivation • Objective • Methodology • Original MOCK • New scalable k-determination scheme • Experiments and Results • Conclusion • Personal Comments
Motivation • Web behavior mining has attracted a great deal of attention today. • MOCK is powerful and strict. But the computational costs are too high when applied to clustering huge data. Too Much Data !!
Objectives • Apply MOCK to web data clustering with a scalable automatic k-determination scheme. • Determine the appropriate k at low cost. • It contains two complementary objectives. • Determination of appropriate k. • Find partitions between k clusters.
Methodology • Original MOCK Third Step First Step Forth Step Second Step Gap statistic
Methodology • New scalable k-determination scheme First Step Second Step First scheme:Calculate adjacent angles x y Second scheme x x
Conclusion • The new scheme is able to determine the appropriate k at low cost, although the performance is poorer than the original algorithm. • Reduce the Pareto size by about 50-70%. • Doesn’t need random data clustering.
Personal Comments • Advantage • MOCK can be applied to large-scale data. • Drawback • Application • Web data.