Expectation-Maximization (EM) Case Studies

Expectation-Maximization (EM)Case Studies CS479/679 Pattern RecognitionDr. George Bebis

Case Study 1: Object Tracking • S. McKenna, Y. Raja, and S. Gong, "Tracking color objects using adaptive mixture models", Image and Vision Computing, vol. 17, pp. 225-231, 1999.

Problem • Tracking color objects in real-time assuming: • Varying illumination, viewing geometry, camera parameters Example: face tracking

Proposed Approach • Model color distribution using an adaptive Mixture of Gaussians (MoGs) model. Hue-Saturation space

Why Using Adaptive Color Mixture Models? • Non-adaptive models have given good results assuming large rotations in depth, scale changes, and partial occlusions. • Dealing with large illumination changes,however,requires an adaptive model.

Color Representation • RGB values are converted to the HSI (Hue-Saturation-Intensity) color representation system. • Only the H and S components were used • “I” was discarded to better handle ambient illumination. • Pixels corresponding to low S values and very high I values were discarded (i.e., not reliable for measuring H)

Color Mixture Models xi is a 2D vector: (Hi, Si) if P(O/xi)>T then xi belongs to O if P(O/xi)>T then xi belongs to O

Main Steps • Initialization: use a predetermined generic object color model to initialize (or re-initialize) the tracker. • Tracking: model adapts and improves its performance by becoming specific to the observed conditions. search window pr-1(xi/O) pr(xi/O) frame: r-1 frame: r frame: r+1

Assumptions • The number of mixture components K is fixed. • In general, the number of components needed to accurately model the color of an object does not change significantly with changing viewing conditions. • Adapting the number of mixture components K might work better!

Use EM to estimate MoG parameters O

Use EM to estimate mixture parameters

Initialization of mixture's parameters θk =(μk,Σk)

Model Adaptation pr-L-1(xi/O) pr-1(xi/O) pr(xi/O) … frame: r-L-1 frame: r-1 frame: r L frames τ τ τ τ= r, r-1, …, r-L adaptive estimate at frame r

Model Adaptation (cont’d) Oτ (τ (τ Oτ

Model Adaptation (cont’d) Efficient computation of pr-1(xi/O) pr(xi/O) … frame: r-L-1 frame: r-1 frame: r Important:

Model Adaptation (cont’d) (see Appendix A)

Example: no adaptation (non-adaptive model – moving camera)

Example: adaptation (adaptive model – moving camera)

Selective Adaptation How should we deal with this issue? Use selective adaptation!

Selective Adaptation (cont’d) likelihood. (an adaptive threshold is being used – see paper)

Example: adapt at each frame (no selective adaptation– moving camera)

Example: selective adaptation (selective adaptation – moving camera)

Case Study 2: Background Modeling • C. Stauffer and E. Grimson, "Adaptive background mixture models for real-time tracking", IEEE Computer Vision and Pattern Recognition Conference, Vol.2, pp. 246-252, 1998

Problem • Real-time segmentation and tracking of moving objects in image sequences. • In general, we can assume: • Fixed or moving camera • Static or varying background • This paper:fixed camera, varying background.

Requirements • Need to handle: • Variations in lighting (i.e., gradual or sudden) • Multiple moving objects • Moving scene clutter (e.g., swaying tree branches) • Arbitrary changes (i.e., parked cars, camera oscillations, etc.)

Traditional Moving Object Detection Approaches • A common method for segmentation and tracking of moving regions involves background subtraction: (1) Subtract a model of the background from current frame. (2) Threshold the difference image. = - current frame background model result of subtraction (after thresholding)

Traditional Moving Object Detection Approaches (cont’d) • How would one obtain a good background model? • Non-adaptive background models have serious limitations. current frame background model result of subtraction (after thresholding)

Traditional Approaches for Background Modeling • Frame differencing • The estimated background is the previous frame • Works for certain object speeds and frame rates • Sensitive to the choice of the threshold absolute difference low threshold high threshold

Traditional Approaches for Background Modeling (cont’d) • Averaging (or median) over time. (median) N frames J. Wang, G. Bebis, MirceaNicolescu, Monica Nicolescu, and R. Miller, "Improving Target Detection by Coupling It with Tracking", Machine Vision and Applications, vol. 20, no. 4, pp. 205-223, 2009.

Traditional Approaches for Background Modeling (cont’d) detected objects based on background subtraction new frame Not robust when the scene contains multiple, slowly moving objects.

Proposed Approach – Key Ideas • Background Model • Model pixel values at each image location as a MoGs (i.e., N2 MoGs for an N x N image). • Use on-line approximation to update the model parameters.

Main Steps • Use background model to classify each pixel as background or foreground. (2) Group foreground pixels and track them from frame to frame using a multiple hypothesis tracker. (3) Update model parameters to deal with: • lighting changes • slow-moving objects • repetitive motions of scene elements (e.g., swaying trees) • long term scene changes (i.e., parked cars)

Why modeling pixel values using MoGs? after 2 minutes specularities green red flickering

Modeling pixel values using MoGs • Consider the values of a particular pixel as a “pixel process” (i.e., “time series”). • Model each pixel process {X1, X2, …, Xt} as a MoGs.

Modeling pixel values using MoGs (cont’d) (i.e., R,G,B are independent with same variance)

Determining the “background” Gaussians • Recent history {X1, X2, …, Xt} of a pixel: • Some of its values might be due to changes caused by moving objects. • Others might be due to background changes (e.g., swaying trees, parked cars). • Need to determine which Gaussians from the mixture represent the “background” process. after 2 minutes

Determining the “background” Gaussians (cont’d) • Two criteria are being used to determine which Gaussians represent the “background” process: • Variance: moving objects are expected to produce more variance than “static” (background) objects. • Persistence: there should be more data supporting the background Gaussians because they are repeated, whereas pixel values from moving objects are often not the same color.

Determining the background Gaussians (cont’d) • The following heuristic is used to determine the "background" Gaussians: “Choose the Gaussians which have high persistence and low variance“

Determining the background Gaussians (cont’d) • To implement this idea, the Gaussians are ordered by the value of πi/σi(i.e., πiis the prior probability). • Choose the first B distributions as those belonging to the background process, where: T=(# background_pixels) / (# total pixels)

Pixel classification • Classify a pixel as background if: • the Gaussian which represents it most effectively is a “background” Gaussian; • Otherwise, classify a pixel as foreground. • A match is defined if a pixel value is within 2.5σ of a Gaussian distribution • In essence, each pixel has its own threshold!

Updating model parameters • New observations are integrated into the model using standard learning rules (i.e., using EM for every pixel would be very costly). • If a match is found, the prior probabilities of each Gaussian in the mixture model are updated as follows: (i.e., exponential forgetting)

Updating model parameters (cont’d) • The parameters of the matched Gaussian i are updated as follows: (i.e., exponential forgetting)

Updating model parameters (cont’d) • If a match is not found, the least probable distribution is replaced with a new Gaussian distribution. • Mean is set to the pixel value • High variance initially • Low prior weight

Grouping and Tracking • Foreground pixels are grouped into different regions (i.e., using connected components). • Moving regions are tracked from frame to frame. • A pool of Kalman filters are used to track the moving regions (i.e., see paper for more details).

Experiments and results • The system was tested continuously for 16 months (24 hrs/day through rain and snow). • Processing power: • 11-13 frames per second • Each frame was160 x 120 pixels. http://www.ai.mit.edu/projects/vsam

Results • Examples of pedestrian and vehicle (aspect ratio was used to filter our inconsistent detections).

Expectation-Maximization (EM) Case Studies