1 / 9

Scalable Automatic k-Determination for Large-Scale Data Clustering

This presenter discusses a new scalable k-determination scheme for applying the powerful yet computationally expensive MOCK algorithm to web data clustering. The objective is to determine the appropriate k at a low cost, with the methodology involving the original MOCK and a new scheme. Experiments show that the new scheme can determine k at a lower cost, reducing Pareto size by 50-70% without needing random data clustering. Its advantage lies in the ability to apply MOCK to large-scale data, though performance may be slightly poorer than the original algorithm.

jacquest
Download Presentation

Scalable Automatic k-Determination for Large-Scale Data Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiobjective Clustering with Automatic k-determination for Large-scale Data Presenter : Shao-Wei Cheng Authors : Nobukazu Matake, Tomoyuki Hiroyasu, Mitsunori Miki, Tomoharu Senda CECCO 2007

  2. Outline • Motivation • Objective • Methodology • Original MOCK • New scalable k-determination scheme • Experiments and Results • Conclusion • Personal Comments

  3. Motivation • Web behavior mining has attracted a great deal of attention today. • MOCK is powerful and strict. But the computational costs are too high when applied to clustering huge data. Too Much Data !!

  4. Objectives • Apply MOCK to web data clustering with a scalable automatic k-determination scheme. • Determine the appropriate k at low cost. • It contains two complementary objectives. • Determination of appropriate k. • Find partitions between k clusters.

  5. Methodology • Original MOCK Third Step First Step Forth Step Second Step Gap statistic

  6. Methodology • New scalable k-determination scheme First Step Second Step First scheme:Calculate adjacent angles x y Second scheme x x

  7. Experiments

  8. Conclusion • The new scheme is able to determine the appropriate k at low cost, although the performance is poorer than the original algorithm. • Reduce the Pareto size by about 50-70%. • Doesn’t need random data clustering.

  9. Personal Comments • Advantage • MOCK can be applied to large-scale data. • Drawback • Application • Web data.

More Related