Optimization & Learning for Registration of Moving Dynamic Textures

Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang1, Xiaolei Huang2, Dimitris Metaxas1 Rutgers University1, Lehigh University2

Outline • Background • Goals & Problems • Related Work • Proposed Method • Experimental Results • Discussion & Conclusion

Background • Dynamic Textures (DT) • static camera, exhibiting certain stationary properties • Moving Dynamic Textures (MDT) • dynamic textures captured by a moving camera DT [Kwatra et al. SIGGRAPH’03] MDT [Fitzgibbbon ICCV’01]

Background • Video registration • Required by many video analysis applications • Traditional assumption • Static, rigid, brightness constancy • Bergen et al. ECCV’92, Black et al. ICCV’93 • Relaxing rigidity assumption • Dynamic textures • Fitzgibbon, ICCV’01; Doretto et al. IJCV’03; Yuan et al. ECCV’04; Chan et al. NIPS’05; Vidal et al. CVPR’05; Lin et al. PAMI’07; Rav-Acha et al. Dynamic Vision Workshop at ICCV’05; Vidal et al. ICCV’07

Our Goal • Registration of Moving Dynamic Textures • Recover the camera motion and register image frames in the MDT image sequence Translation to the left Translation to the right

Complex Optimization Problem • Complex optimization • W.r.t. camera motion, dynamic texture model • Chicken-and-Egg Problem • Challenges • About the mean images • About Linear Dynamic System (LDS) model • About the camera motion

Related Works • Fitzgibbon, ICCV’01 • Pioneering attempt • Stochastic rigidity • Non-linear optimization • Vidal et al. CVPR’05 • Time varying LDS model • Static assumption in small time windows • Simple and general framework but often under-estimate motion

Formulation • Registration of MDT • I(t), the video frame • , camera motion parameters • y0 , the desired average image of the video • y(t), appearance of DT • x(t), dynamics of DT

Generative Model Generative image model for a MDT

First Observation • Good registration • A good registration according to the accurate camera motion should simplify the dynamic texture model while preserving all useful information • Used by Fitzgibbon, ICCV’01, Minimizing the entropy function of an auto regressive process • Used by Vidal, CVPR’05, optimizing time varying LDS model by optimizing piecewise LDS model

Second Observation • Good registration • A good registration according to the accurate camera motion should lead to a sharp average image whose statistics of derivative filters are similar to those of the input image frames. • Statistics of derivative filters in images • Student-t distribution/heavy-tailed image priors • Huang et al. CVPR’99, Roth et al. CVPR’05

Prior Models • The Average Image Prior • The Motion Prior • The Dynamics Prior

Average Image Priors • Student-t distribution • Model parameters / contrastive divergence method (a) Before registration, (b) In the middle of registration (c) After registration

Motion / Dynamics Priors • Gaussian Perturbation (Motion) • Uncertainty in the motion is modeled by a Gaussian perturbation about the mean estimation M0 with the covariance matrix S ( a diagonal matrix) • Motivated by the work [Pickup et al. NIPS’06] • GPDM / MAR model (Dynamic) • Marginalizing over all possible mappings between appearance and dynamics • Motivated by the work [Wang et al. NIPS’05], [Moon et al. CVPR’06]

Joint Optimization • Generative image model • Optimization • Final marginal likelihood • Scaled conjugate gradients algorithm (SCG)

Procedures • Obtaining image derivative prior model • Dividing the long sequence into many short image sequences • Initialization for video registration • Performing model optimization with the proposed prior models until model convergence. • With estimated y0, Y and X, the camera motion is then obtained iteratively by Maximum Likelihood estimation using SCG optimization

Obtaining Data • Three DT video sequences • DT data [Kwatra et al. SIGGRAPH’03] • Synthesized MDT video sequence • 60 frames each, no motion from 1st to 20th frame and from 41st to 60th • Camera motion with speed [1, 0] from 21st to 40th

Grass MDT Video • The average image (a) One frame, (b) the average image after registration, (c) average image before registration

Grass MDT Video • The statistics of derivative filter responses

Evaluation / Comparison • False Estimation Fraction • Comparison with two classical methods • Hybrid method, [Bergen et al. ECCV’92] [Black et al. ICCV’93] • Vidal’method, [Vidal et al. CVPR’05]

Waterfall MDT Video • Motion estimation • Ground truth, (b) by hybrid method, (c) by Vidal’s, (d) by our method

Waterfall MDT Video • The average Image and its statistics The average image and its derivative filter response distribution after registration by: (a) our method, (b) Vidal’s method, (c) hybrid method

FEF Comparison • On three synthesized MDT video

Experiment on real MDT Video • Moving flower bed video • 554 frames total • Ground truth motion 110 pixels • Estimation 104.52 pixels ( FEF 4.98%)

Conclusions • Proposed: • Powerful priors for MDT registration • Solution for: • Camera motion, Average image of video, Dynamic texture model • What have we learned? • Correct registration simplifies DT model while preserving useful information • Better registration leads to sharper average image

Thank you !

Future work • More complex camera motion • Different metrics for performance evaluation • Multiple dynamic texture segmentation

Experiment on real MDT Video • Moving flower bed video • Our method • 554 frames total • Ground truth motion 110 pixels • Estimation 104.52 pixels ( FEF 4.98%) • Vidal’s method • 250 frames [Vidal et al. CVPR’05] • Ground truth motion 85 pixels • Estimation 60 pixels (FEF 29.41%)

Optimization & Learning for Registration of Moving Dynamic Textures