Optimization & Learning for Registration of Moving Dynamic Textures

Optimization & Learning for Registration of Moving Dynamic Textures Junzhou Huang1, Xiaolei Huang2, Dimitris Metaxas1 Rutgers University1, Lehigh University2

Outline • Background • Goals & Problems • Related Works • Proposed Method • Experiment Results • Discussion & Conclusion

Background • Dynamic textures (DT) • static camera, exhibits a certain stationary • Moving Dynamic textures (MDT) • dynamic textures captured by a moving camera DT, [Kwatra et al. SIGGRAPH’03] MDT, [Fitzgibbon ICCV’01]

Background • Video registration • Required by many video analysis applications • Traditional assumption • Static, rigid, brightness constancy • Bergen et al. ECCV’92, Black et al. ICCV’93 • Relaxing rigid assumption • Dynamic textures • Doretto et al. IJCV’03, Yuan at al. ECCV’04, Chan et al. NIPS’05, Lin et al. PAMI’07, Rav-Acha at al. Workshop at ICCV’05

Our Goals • Registration of MDT • Recover the camera motion and register the image sequences including moving dynamic textures Left Translation Right Translation

Complex Optimization Problems • Complex optimization • Camera motion, dynamic texture model • Chicken-and-Egg Problems • Challenges • About the mean images • About LDS model • About the camera motion?

Related Works • Fitzgibbon, ICCV’01 • Pioneering attempt • Stochastic rigidity • Non-linear optimization • Vidal et al. CVPR’05 • Time varying LDS model • Static assumption in small time window • Simple and general framework but under estimation

Formulation • Registration of MDT • I(t), the video frame • camera motion parameters • y0 , the desired average image of the video • y(t), related with appearance of DT • x(t), related with dynamics of DT

Generative Model Generative image model for a MDT

First Observation • Good registration • a good registration according to the accurate camera motion should simplify the dynamic texture model while preserving all useful information • Used by Fitzgibbon, ICCV’01, Minimizing the entropy function of an auto regressive process • Used by Vidal, CVPR’05, optimizing time varying LDS model by optimizing piecewise LDS model

Second Observation • Good registration • A good registration according to the accurate camera motion should lead to a sharp average image whose statistics of derivative filters are similar to those of the input image frames. • Image statistics • Student-t distribution / heavy tailed image priors • Huang et al. CVPR’99, Roth et al. CVPR’05

Prior Models • The Average image priors • The motion priors • The dynamic priors

Average Image Priors • Student-t distribution • Model parameters / contrastive divergence method (a) Before registration, (b) in the middle of registration (c) after registration

Motion / Dynamic Priors • Gaussian Perturbation (Motion) • Uncertainty in the motion modeled by a Gaussian perturbation about the mean estimation M0 / the covariance matrix S ( a diagonal matrix.) • Motivated by the work [Pickup et al. NIPS’06] • GPDM / MAR model (Dynamic) • Marginalizing over all possible mappings between appearance and dynamics • Motivated by the work [Wang et al. NIPS’05] [Moon et al. CVPR’06]

Joint Optimization • Generative image model • Optimization • Final marginal likelihood • Scaled conjugate gradients algorithm (SCG)

Procedures • Obtaining image derivative prior model • Dividing the long sequence into many short image sequences • Initialization for video registration • Performing model optimization with the proposed prior models until model convergence. • With estimated y0, Y and X, the camera motion is then obtained

Obtaining Data • Three DT video sequences • DT data, [Kwatra et al. SIGGRAPH’03] • Synthesized MDT video sequence • 60 frames each, no motion from 1st to 20th frame and from 41st to 60th • Camera motions with speed [1, 0] from 21st to 40th

Grass MDT Video • The average image (a) One frame, (b) the average image after registration, (c) before registration

Grass MDT Video • The statistics of derivative filter responses

Evaluation / Comparison • False Estimation Fraction • Comparison with two classical methods • Hybrid method, [Bergen et al. ECCV’92] [Black et al. ICCV’93] • Vidal’method, [Vidal et al. CVPR’05]

Waterfall MDT Video • Motion estimation (a) Ground truth, (b) by hybrid method, (c) by Vidal’s, (d) proposed

Waterfall MDT Video • The average Image and its statistics The average image and related distribution after registration by (a) proposed method, (b) Vidal’s method, (c) hybrid method

FEF Comparisons • On three synthesized MDT video

Real MDT Video • Moving flower bed video • Ours • 554 frames totally • Ground truth 110 pixels • Estimation 104.52 pixels ( FEF 4.98%) • Vidal’s • 250 frames • Ground truth 85 pixels • Estimation 60 pixels ( FEF 29.41%)

Conclusions • What proposing: • Powerful priors for MDT registration • What getting out: • Camera motions • Average image • Dynamic texture model • What learning? • Registration simplify DT model while preserving useful information • Better registration lead to sharper average image

Thanks !

Future Works • More complex camera motions • Different Metric functions for evaluation • Multiple dynamic texture segmentation

Optimization & Learning for Registration of Moving Dynamic Textures