1 / 56

Generative Models for Image Understanding

Generative Models for Image Understanding. Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois. Problem: Summarization of High Dimensional Data. Pattern Analysis: For several classes c=1,..,C of the data, define probability distribution functions p(x| c)

toki
Download Presentation

Generative Models for Image Understanding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generative Models for Image Understanding Nebojsa Jojic and Thomas Huang Beckman Institute and ECE Dept. University of Illinois

  2. Problem: Summarization of High Dimensional Data • Pattern Analysis: • For several classes c=1,..,C of the data, define probability distribution functions p(x| c) • Compression: • Define a probabilistic model p(x) and devise an optimal coding approach • Video Summary: • Drop most of the frames in a video sequence and keep interesting information that summarizes it.

  3. Generative density modeling • Find a probability model that • reflects desired structure • randomly generates plausible images, • represents the data by parameters • ML estimation • p(image|class) used for recognition, detection, ...

  4. Problems we attacked • Transformation as a discrete variable in generative models of intensity images • Tracking articulated objects in dense stereo maps • Unsupervised learning for video summary • Idea - the structure of the generative model reveals the interesting objects we want to extract.

  5. Mixture of Gaussians c P(c) = pc The probability of pixel intensities z given that the image is from cluster c is p(z|c) = N(z;mc, Fc) z

  6. z p(z|c) = N(z;mc, Fc) Mixture of Gaussians c P(c) = pc • Parameterspc,mc and Fc represent the data • For input z, the cluster responsibilities are P(c|z) = p(z|c)P(c) /Scp(z|c)P(c)

  7. m1= F1= p1= 0.6, m2= F2= p2= 0.4, Example: Simulation P(c) = pc c=1 z= p(z|c) = N(z;mc, Fc)

  8. m1= F1= p1= 0.6, m2= F2= p2= 0.4, Example: Simulation P(c) = pc c=2 z= p(z|c) = N(z;mc, Fc)

  9. Example: Learning - E step P(c|z) c=1 0.52 m1= F1= p1= 0.5, c 0.48 c=2 m2= F2= p2= 0.5, z= Images from data set

  10. Example: Learning - E step P(c|z) c=1 0.48 m1= F1= p1= 0.5, c 0.52 c=2 m2= F2= p2= 0.5, z= Images from data set

  11. Example: Learning - M step m1= F1= p1= 0.5, c m2= F2= p2= 0.5, Set m1 to the average of zP(c=1|z) z Set m2 to the average of zP(c=2|z)

  12. Example: Learning - M step m1= F1= p1= 0.5, c m2= F2= p2= 0.5, Set F1 to the average of diag((z-m1)T (z-m1))P(c=1|z) z Set F2 to the average of diag((z-m2)T (z-m2))P(c=2|z)

  13. Transformation as a Discrete Latent Variable with Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute & ECE, Univ of Illinois at Urbana

  14. Kind of data we’re interested in Even after tracking, the features still have unknown positions, rotations, scales, levels of shearing, ...

  15. Oneapproach Images Labor Normalization Normalized images Pattern Analysis

  16. Ourapproach Images Joint Normalization and Pattern Analysis

  17. What transforming an image does in the vector space of pixel intensities • A continuous transformation moves an image, , along a continuous curve • Our subspace model should assign images near this nonlinear manifold to the same point in the subspace

  18. Tractable approaches to modeling the transformation manifold \ Linear approximation - good locally • Discrete approximation - good globally

  19. Adding “transformation” as a discrete latent variable • Say there are N pixels • We assume we are given a set of sparse N x Ntransformation generating matricesG1,…,Gl,…,GL • These generate points from point

  20. Transformed Mixture of Gaussians P(c) = pc c p(z|c) = N(z;mc, Fc) P(l) = rl l z p(x|z,l) = N(x; Glz, Y) • rl,pc,mc and Fc represent the data • The cluster/transf responsibilities, P(c,l|x), are quite easy to compute x

  21. m1= F1= m2= F2= Example: Simulation G1 = shift left and up, G2 = I, G3 = shift right and up c=1 z= l=1 x=

  22. c l z x ML estimation of a Transformed Mixture of Gaussians using EM • E step: Compute P(l|x), P(c|x) and p(z|c,x) for each x in data • M step: Set • pc = avg of P(c|x) • rl = avg of P(l|x) • mc = avg mean of p(z|c,x) • Fc = avg variance of p(z|c,x) • Y = avg var of p(x-Gl z|x)

  23. Face Clustering Examples of 400 outdoor images of 2 people (44 x 28 pixels)

  24. Mixture of Gaussians 15 iterations of EM (MATLAB takes 1 minute) Cluster means c = 1 c = 2 c = 3 c = 4

  25. Transformed mixture of Gaussians 30 iterations of EM Cluster means c = 1 c = 2 c = 3 c = 4

  26. Video Analysis Using Generative Models with Brendan Frey, Nemanja Petrovic and Thomas Huang

  27. Idea • Use generative models of video sequences to do unsupervised learning • Use the resulting model for video summarization, filtering, stabilization, recognition of objects, retrieval, etc.

  28. c c l l z z x x t-1 t Transformed Hidden Markov Model P(c,l|past)

  29. THMM Transition Models • Independent probability distributions for class and transformations; relative motion P(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1)) • Relative motion dependent on the class P(ct , lt | past)= P(ct | ct-1) P(d(lt , l t-1) | ct) • Autoregressive model for transformation distribution

  30. Inference in THMM • Tasks: • Find the most likely state at time t given the whole observed sequence {xt} and the model parameters (class means and variances, transition probabilities, etc.) • Find the distribution over states for each time t • Find the most likely state sequence • Learn the parameters that maximize he likelihood of the observed data

  31. c Video summary Image segmentation l z Removal of sensor noise x Image Stabilization Video Summary and Filtering p(z|c) = N(z;mc, Fc) p(x|z,l) = N(x; Glz, Y)

  32. mc 1 class 121 translations (11 vertical and 11 horizontal shifts) Fc mc Fc 5 classes Example: Learning • Hand-held camera • Moving subject • Cluttered background DATA

  33. Examples • Normalized sequence • Simulated sequence • De-noising • Seeing through distractions

  34. Future work • Fast approximate learning and inference • Multiple layers • Learning transformations from images Nebojsa Jojic: www.ifp.uiuc.edu/~jojic

  35. Subspace models of imagesExample: Image, R 1200 = f (y, R 2) Shut eyes Frown

  36. Factor analysis (generative PCA) p(y) = N(y; 0, I) y The density of pixel intensities z given subspace point y is p(z|y) = N(z;m+Ly, F) z Manifold: f (y) = m+Ly, linear

  37. Factor analysis (generative PCA) p(y) = N(y; 0, I) y p(z|y) = N(z;m+Ly, F) • Parametersm,L represent the manifold • Observing z induces a Gaussian p(y|z): COV[y|z] = (LTF-1L+I)-1 E[y|z] = COV[y|z] LTF-1 z z

  38. m = Shut eyes Frown Example: Simulation SE y L = p(y) = N(y; 0, I) Frn p(z|y) = N(z;m+Ly, F) z

  39. m = Shut eyes Frown Example: Simulation SE y L = p(y) = N(y; 0, I) Frn p(z|y) = N(z;m+Ly, F) z

  40. m = Shut eyes Frown Example: Simulation SE y L = p(y) = N(y; 0, I) Frn p(z|y) = N(z;m+Ly, F) z

  41. Transformed Component Analysis p(y) = N(y; 0, I) y p(z|y) = N(z;m+Ly, F) P(l) = rl z l The probability of observed imagex is p(x|z,l) = N(x; Glz, Y) x

  42. m = L = Shut eyes Frown Example: Simulation G1 = shift left & up, G2 = I, G3 = shift right & up SE y Frn z l=3 x

  43. G G a a r r b b a a g g e e Example: Inference G1 = shift left & up, G2 = I, G3 = shift right & up SE SE SE y y y Frn Frn Frn P(l=1|x) = .01 P(l=2|x) = .01 P(l=3|x) = .98 z z z l=1 l=2 l=3 x x x

  44. EM algorithm for TCA • Initialize m, L,F, r, Y to random values • E Step • For each training case x(t), infer q(t)(l,z,y) = p(l,z,y |x(t)) • M Step • Compute mnew,Lnew,Fnew,rnew,Ynewto maximize St E[ log p(y) p(z|y) P(l) p(x(t)|z,l)], where E[] is wrt q(t)(l,z,y) • Each iteration increases log p(Data)

  45. A tough toy problem • 144, 9 x 9 images • 1 shape (pyramid) • 3-D lighting • cluttered background • 25 possible locations

  46. 1st 8 principal components: TCA: • 3 components • 81 transformations - 9 horiz shifts - 9 vert shifts • 10 iters of EM • Model generates realistic examples m L:1L:2 L:3 F Y

  47. Expression modeling • 100 16 x 24 training images • variation in expression • imperfect alignment

  48. Factor Analysis: Mean + 10 factors after 70 its of EM TCA: Mean + 10 factors after 70 its of EM PCA: Mean + 1st 10 principal components

  49. Fantasies from FA model Fantasies from TCA model

  50. Modeling handwritten digits • 200 8 x 8 images of each digit • preprocessing normalizes vert/horiz translation and scale • different writing angles (shearing) - see “7”

More Related