1 / 4

Supervised Clustering

Pranjal Awasthi , Carnegie Mellon University Reza Bosagh Zadeh, Stanford University. Supervised Clustering. Clustering is usually unsupervised Full supervision renders task meaningless, find middle ground by interacting with teacher Remove subjective ambiguities according to teacher

azra
Download Presentation

Supervised Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PranjalAwasthi, Carnegie Mellon University Reza Bosagh Zadeh, Stanford University Supervised Clustering • Clustering is usually unsupervised • Full supervision renders task meaningless, find middle ground by interacting with teacher • Remove subjective ambiguities according to teacher • [Balcan, Blum’08] : A PAC style query model for clustering.

  2. The Model • Limited interaction with teacher • Only query allowed: “Here’s what I think the clustering should be” • Teacher responds with one of: • Split this cluster: c • Merge these two clusters: c1 and c2 • How many queries can we get away with in the worst case?

  3. Main Results • Previous query bound of O(k3 log |C|) known for any concept class C. • We improve the bound to O(k log |C|). • Give algorithms for clustering geometric concept classes. • Present noisy versions of model and give query bounds. • What if we knew about separation properties of the dataset?

  4. Dataset Separation • Worst case number of queries under some “separation” properties: • The better separated the dataset, the fewer queries required • Lots of open problems!

More Related