MURI Update 7/26/12

MURI Update 7/26/12 UC Berkeley Prof. Trevor Darrell (UCB) Prof. Michael Jordan (UCB) Prof. Brian Kulis(fmr UCB postdoc, now Asst. Prof. OSU) Dr. Stefanie Jegelka (UCB postdoc)

Recent Effort • NPB inference of visual structures - Objects - Trajectories • Develop richer representations (appearance, shape) • Efficient implementation in constrained environments • Distributed and decentralized variants

NPB Trajectory Models • Extend NPB models to trajectory domains: • find structure in motion trajectories • identify anomalies • Considering Bluegrass data and possibly ARGUS track data via LLNL collaboration • HDP and hard clustering variants • Decentralized / distributed implementation topics topics

Fix Covariances, Take Limit Make Bayesian, Take Limit Mixture of Gaussians DP Mixture ?? k-means New Algorithms for Hard Clustering via Bayesian Nonparametrics Generalized to Hard HDP as well; [Kulis and Jordan ICML2012] • Kulis Presentation

Small Variance Asymptotics, Bayesian Nonparametrics, and k-means • Brian Kulis • Ohio State University • Joint work with Michael Jordan (Berkeley), Ke Jiang (OSU), and Tamara Broderick (Berkeley)

Fix Covariances, Take Limit Make Bayesian, Take Limit Mixture of Gaussians DP Mixture ???? k-means Generalizing k-means [Kulis and Jordan, ICML 2012]

Why Should We Care? • For “k-means people” • Bayesian techniques permit great flexibility • Extensions to multiple data sets would not be obvious without this connection • For Bayesians • Hard clustering methods scale better • Connections to graph cuts and spectral methods • k-means just works

Gaussian Mixture Models • No closed-form for optimizing likelihood • Typically resort to EM algorithm

EM Algorithm E-Step M-Step

Mixture of Gaussians k-means Fix Covariances, Take Limit k-means In the limit as sigma goes to 0, this value is 1 if centroid c is the closest, and 0 otherwise

For mixture of Gaussians: Latent Variables: Data: Likelihood: Linear-Gaussian Latent Feature Models

Log-Likelihood Asymptotics

Let Z have a continuous distribution (Essentially) probabilistic PCA Other Linear-Gaussian Models As sigma goes to 0, get standard PCA

The Polya Urn • Imagine an urn with theta balls for each of k colors • Pick a ball from the urn, replace the ball and add another ball of that color to the urn • Repeat n times • Induces a distribution over Z

The Dirichlet-Multinomial Distribution (If theta an integer, gamma functions are factorials.)

Nothing interesting happens unless: In this case, we obtain: Small-Variance Asymptotics

... The Chinese Restaurant Process • Customers sequentially enter the restaurant • First customer sits at first table • Subsequent customers sit at occupied table with probability proportional to number of occupants • Start a new table with probability propotional to theta 1 2 5 6 3 4

The exchangeable partition probability function (EPPF): Select theta as before, yields asymptotically: The Chinese Restaurant Process

Algorithmic Perspective: Gibbs Sampling • Suppose we want to sample from p(x), where • Repeatedly sample as follows

Collapsed Gibbs Sampling for CRP/DP Mixtures • Want to sample from the posterior: • Need the following to do Gibbs sampling:

Asymptotics of the Gibbs Sampler • Now, would like to see what happens when sigma goes to 0 • Need 1 additional thing • must be a function of sigma: DP Mixture ????

Asymptotics of the Gibbs Sampler Start a New Cluster Existing Clusters In the limit:

DP-means

Underlying Objective Function Theorem: The DP-means algorithm monotonically minimizes this objective until local convergence.

3 Gaussians

Clustering with Multiple Data Sets Want to cluster each data set, but also want to share cluster structure Use the Hierarchical Dirichlet Process (HDP)!

The Hard Gaussian HDP: Objective Theorem: The Hard HDP algorithm monotonically minimizes this objective until local convergence.

Exponential Families • What happens when we replace the Gaussian likelihood with an arbitrary exponential family distribution? • Do asymptotics where the mean is fixed but the covariance goes to 0:

Exponential Families • Use the standard EF conjugate prior • Utilize Bregman divergences: • Asymptotics of Gibbs require Laplace’s method on the marginal likelihood • End up with same result as before, with Bregman divergence replacing sq. Euclidean

Illustration: Hard Topic Models

So far, have considered single assignment models What if each point can be assigned to multiple clusters? Overlapping Clusters / Binary Feature Models

Bayesian Bayesian Nonparametric Overlapping Clusters Single Assignment Multi Assignment Non-Bayesian Can perform analogous small-variance asymptotics using the above multi-assignment distributions

Indian Buffet Process • First customer samples dishes • Subsequent customer i samples existing dishes with probability equal to fraction of previous customers sampling that dish • Also samples new dishes

Number of points possessing feature c Number of new dishes sampled by customer i Indian Buffet Process Small-Variance Asymptotics

Indian Buffet Process Small-Variance Asymptotics (Reduces to DP-means obj. for single assignments)

Spectral Clustering Connections • Standard spectral relaxation • Write as • Relax Y to be an arbitrary orthogonal matrix • Take matrix of top k eigenvectors of K

Spectral Relaxation for DP-means • How do we extend this? • Write as • Relax Y to be an arbitrary orthogonal matrix • Take matrix of eigenvectors of K whose eigenvalues are greater than lambda

Graph Clustering • Can further take advantage of connections between k-means and graph clustering • Generalize hard DP objective to weighted and kernel forms • Special cases lead to graph clustering objectives that penalize the number of clusters in the graph

Conclusions and Open Questions • Focus was on obtaining non-probabilistic models from probabilistic ones • Small-variance asymptotics for Bayesian nonparametrics yields models which regularize / penalize • Number of clusters • Number of features • Number of topics • Also yields algorithmic insights

Conclusions and Open Questions • Spectral or semidefinite relaxation for the hard HDP? • Better algorithms? • Local Search • Multilevel Methods • Split/Merge • Incorporate ideas from other inference schemes • Applications? • Other models?

Thanks!

2) Decentralized models • Distributed Hard Topic Models • Jegelka presentation

Decentralized k-means algorithms • Data distributed • Clusters shared globally • Cluster assignment • Update centers Communication Where store centers?

Decentralized k-means • Local copies, gossip [Datta et al,…] • Local assignment & update • Sharing & averaging with neighbor(s)

Decentralized k-means • Local copies, gossip [Datta et al,…] • Single copy of each mean • Summaries & pruning for restricted comparisons [Papatreou et al.]

New questions • Decentralized cluster creation? • Sequential vs. parallel • Naturally hierarchical • Exploit structure to reduce communication • Partial sharing, e.g. locality • Integration into model? • Generalization to IBP, PYP, …

Decentralized clustering • Observe trajectories in a scene • Cluster locations by local traffic behavior: HDP  anomalies, traffic prediction, … • Clusters are not omnipresent:partial sharing

3) Trajectory Data • Scalable Topic Models for Trajectories • Apply current results on scalable topic modeling to problems in vision • find structure in motion trajectories • identify anomalies • Considering Bluegrass data and possibly ARGUS track data via LLNL collaboration

Trajectory Video

Experimental Results Track Fragment Vocabulary (Colored by Speed) Road Network

MURI Update 7/26/12

MURI Update 7/26/12

Presentation Transcript

MURI 6

Working Group 7: Botnet Remediation Status Update

MURI Overview

12-7

12-7

MURI

12-7-12

MURI Plans

12-7

12-7

MURI Telecon, Update 7/26/2012

Harvard - MURI

MURI Update

IMPRINT models of training: Update on RADAR modeling MURI Annual Meeting September 12, 2008

12-7

12-7

The MURI taxonomy and training for military tasks MURI Annual Review September 7, 2007

MURI Objective

MURI Metamaterial

MURI Presentation

12-7

12-7