1 / 30

Density Traversal Clustering and Generative Kernels

Density Traversal Clustering and Generative Kernels. a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh. Attribute Generalisation. Prior work. Tishby and Slonim Meila and Shi Coifman et al Nadler et al. Example : Transition Matrix.

halee-dean
Download Presentation

Density Traversal Clustering and Generative Kernels

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Density Traversal Clusteringand Generative Kernels a generative framework for spectral clustering Amos Storkey, Tom G Griffiths University of Edinburgh Amos Storkey, School of Informatics.

  2. Attribute Generalisation Amos Storkey, School of Informatics, University of Edinburgh

  3. Prior work • Tishby and Slonim • Meila and Shi • Coifman et al • Nadler et al Amos Storkey, School of Informatics, University of Edinburgh

  4. Example: Transition Matrix Amos Storkey, School of Informatics, University of Edinburgh

  5. Example: 20 Iterations Amos Storkey, School of Informatics, University of Edinburgh

  6. Example: 400 Iterations Amos Storkey, School of Informatics, University of Edinburgh

  7. Argument • A priori dependence on data. • No generative model. • Inconsistent with underlying density. • Clusters are spatial characteristics that are properties of distributions. • Clusters are only properties of data sets in as much as they inherit the property from the underlying distribution from which the data was generated. Amos Storkey, School of Informatics, University of Edinburgh

  8. But we do know • Know diffusion asymptotics, but probabilistic formalism inconsistent with data density: • Finite time-step, infinite data limit equilibrium distribution does not match data distribution. Amos Storkey, School of Informatics, University of Edinburgh

  9. Density Traversal Clustering • Define discrete time, continuous, diffusing Markov chain. • Definition dependent on some latent distribution. • Call this the Traversal Distribution. Amos Storkey, School of Informatics, University of Edinburgh

  10. The Markov chain • Transition with probability • D(y,x) is Gaussian centred at x, P* is Traversal distribution. • Here S is given by the solution of Amos Storkey, School of Informatics, University of Edinburgh

  11. Generative procedure Amos Storkey, School of Informatics, University of Edinburgh

  12. Problems • Random walk in continuous space • Each step involves many intractable integrals. • Real Bayesians would... • Good prior distributions over distributions is a hard problem, but need prior for traversal distributions. Amos Storkey, School of Informatics, University of Edinburgh

  13. CHEAT • Doing all the integrals is not possible, but... • All integrals are with respect to traversal distribution • Use empirical data proxy • All the integrals now become sample estimates: sums over the data points. • Everything is computable in the space of data points. • WORKS!: never need to evaluate the probability at a point, only integrals over regions. Amos Storkey, School of Informatics, University of Edinburgh

  14. We get… • Scaled likelihood P(xi | centre xj) / P(xi) = n (AD)ij • A = WS-1 • W is usual affinity • S-1is extra consistency term. • More generally have out of sample scaled likelihood: • P(x | centre y) / P(x)= n a(x)T(AD-2)b(y) where a(x) and b(x) are the traversal probabilities to and from x. Amos Storkey, School of Informatics, University of Edinburgh

  15. Example: Scaled likelihoods Amos Storkey, School of Informatics, University of Edinburgh

  16. Example: 20 Iterations Amos Storkey, School of Informatics, University of Edinburgh

  17. Example: 400 Iterations Amos Storkey, School of Informatics, University of Edinburgh

  18. Initial distribution • Can consider other initial distributions. • Specifically can consider delta functions at mixture centres. • Variational Bayesian Mixture models… Amos Storkey, School of Informatics, University of Edinburgh

  19. Demo Amos Storkey, School of Informatics, University of Edinburgh

  20. Number of clusters • Scaled likelihoods for three cluster problem. Amos Storkey, School of Informatics, University of Edinburgh

  21. Number of clusters • Scaled likelihoods for a five cluster problem. Amos Storkey, School of Informatics, University of Edinburgh

  22. Cluster allocations Amos Storkey, School of Informatics, University of Edinburgh

  23. Cluster allocations Amos Storkey, School of Informatics, University of Edinburgh

  24. Conclusion • A priori formulation of spectral clustering. • Can be used as any other spectral procedure • But also provides scaled likelihoods – can be combined with Bayesian procedures. • Variational Bayesian formalism. • Small sample approximation issues. • Better to have a flexible density estimator. Amos Storkey, School of Informatics, University of Edinburgh

  25. X Generative Kernels • Related to Seeger: Covariance Kernels from Bayesian Generative Models Gaussian Process over X space Density, and corresponding traversal process. Data is obtained by diffusing in X space using the traversal process... And then local averaging and Additive noise. Amos Storkey, School of Informatics, University of Edinburgh

  26. Generative Kernels • Covariance Kijis • Again use sample estimates. • Presume measured target is local average. • Just standard basis function derivation of GP. Amos Storkey, School of Informatics, University of Edinburgh

  27. Motivation • Generative model generates clustered data positions. • Targets diffuse using traversal process. • Target values suffer locality averaging influence: • Diffused objects locally influence one another’s target values so everyone becomes like their neighbours. • E.g. Accents. • Can add local measurement noise. Amos Storkey, School of Informatics, University of Edinburgh

  28. Kernel Clustering • Use sample estimates again to get kernel • Can also encorporate a prior over iterations and integrate out. • For example can use matrix exponential exp(A) instead of (AD). Amos Storkey, School of Informatics, University of Edinburgh

  29. Generating targets for rings data • Can generate from the model: • Across cluster covariance is low. • Within cluster continuity. Amos Storkey, School of Informatics, University of Edinburgh

  30. The point? • Density dependence matters in missing data problems. • Gaussian process: data with missing targets has no influence. • Density Traversal Kernel: data with missing targets affects kernel, and hence has influence. Amos Storkey, School of Informatics, University of Edinburgh

More Related