1 / 17

Unsupervised, Cont’d Expectation Maximization

Unsupervised, Cont’d Expectation Maximization. Presentation tips. Practice! Work on knowing what you’re going to say at each point. Know your own presentation Practice! Work on timing You have 15 minutes to talk + 3 minutes for questions Will be graded on adherence to time!

ciara-oneil
Download Presentation

Unsupervised, Cont’d Expectation Maximization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised, Cont’dExpectation Maximization

  2. Presentation tips • Practice! • Work on knowing what you’re going to say at each point. • Know your own presentation • Practice! • Work on timing • You have 15 minutes to talk + 3 minutes for questions • Will be graded on adherence to time! • Timing is hard. Becomes easier as you practice

  3. Presentation tips • Practice! • What appears on your screen is diff than what will appear when projected • Different size; different font; different line thicknesses; different color • Avoid hard-to-distinguish colors (red on blue) • Don’t completely rely on color for visual distinctions

  4. The final report • Due: Dec 17, 5:00 PM (last day of finals week) • Should contain: • Intro: what was your problem; why should we care about it? • Background: what have other people done? • Your work: what did you do? Was it novel or re-implementation? (Algorithms, descriptions, etc.) • Results: Did it work? How do we know? (Experiments, plots & tables, etc.) • Discussion: What did you/we learn from this? • Future work: What would you do next/do over? • Length: Long enough to convey all that

  5. The final report • Will be graded on: • Content: Have you accomplished what you set out to? Have you demonstrated your conclusions? Have you described what you did well? • Analysis: have you thought clearly about what you accomplished, drawn appropriate conclusions, formulated appropriate “future work”, etc? • Writing and clarity: Have you conveyed your ideas clearly and concisely? Are all of your conclusions supported by arguments? Are your algorithms/data/etc. described clearly?

  6. Back to clustering • Purpose of clustering: • Find “chunks” of “closely related” data • Uses notion of similarity among points • Often, distance is interpreted as similarity • Agglomerative: • Start w/ individuals==clusters; join together clusters • There’s also divisive: • Start w/ all data==one cluster; split apart clusters

  7. Combinatorial clustering • General clustering framework: • Set target of k clusters • Choose a cluster optimality criterion • Often function of “between-cluster variation” vs. “within-cluster variation” • Find assignment of points to clusters that minimizes (maximizes) this criterion • Q: Given N data points and k clusters, how many possible clusterings are there?

  8. Example clustering criteria • Define: • Cluster i: • Cluster i mean: • Between-cluster variation: • Within-cluster variation:

  9. Example clustering criteria • Now want some way to trade off within vs. between • Usually want to decrease w/in-cluster var, but increase between-cluster var • E.g., maximize: • or: • α>0 controls relative importance of terms

  10. Comb. clustering example Clustering of seismological data http://www.geophysik.ruhr-uni-bochum.de/index.php?id=3&sid=5

  11. Unsup. prob. modeling • Sometimes, instead of clusters want a full probability model of data • Can sometimes use prob. model to get clusters • Recall: in supervised learning, we said: • Find a probability model, Pr[X|Ci] for each class, Ci • Now: find a prob. model for data w/o knowing class: Pr[X] • Simplest: fit your favorite model via ML • Harder: assume a “hidden cluster ID” variable

  12. Hidden variables • Assume data is generated by k different underlying processes/models • E.g., k different clusters, k classes, etc. • BUT, you don’t get to “see” which point was generated by which process • Only get the X for each point; the y is hidden • Want to build complete data model from k different “cluster specific” models:

  13. Mixture models • This form is called a “mixture model” • “mixture” of k sub-models • Equivalent to the process: Roll a weighted die (weighted by αi); choose the corresponding sub-model; generate a data point from that sub-model • Example: mixture of Gaussians:

  14. Parameterizing a mixture • How do you find the params, etc? • Simple answer: use maximum likelihood: • Write down joint likelihood function • Differentiate • Set equal to 0 • Solve for params • Unfortunately... It doesn’t work in this case • Good exercise: try it and see why it breaks • Answer: Expectation Maximization

  15. Expectation-Maximization • General method for doing maximum likelihood in the presence of hidden variables • Identified by Dempster, Laird, & Rubin (1977) • Called the “EM algorithm”, but is really more of a “meta-algorithm”: recipe for writing algorithms • Works in general when you have: • Probability distribution over some data set • Missing feature/label vals for some/all data points • Special cases: • Gaussian mixtures • Hidden Markov models • Kalmann fliters • POMDPs

  16. The Gaussian mixture case • Assume: data generated from 1-d mixture of Gaussians: • Whole data set: • Introduce a “responsibility” variable: • If you know model params, can calculate responsibilities

  17. Parameterizing responsibly • Assume you know the responsibilities, zij • Can use this to find parameters for each Gaussian (think about special case where zij=0 or 1):

More Related