920 likes | 1.11k Views
Database and Visual Front End. Makis Potamianos. Active Appearance Model Visual Features. Iain Matthews. Acknowledgments. Cootes, Edwards, Talyor, Manchester Sclaroff, Boston. AAM Overview. Appearance. Region of interest. Warp to reference. Shape & Appearance. Landmarks. Shape.
E N D
Database and Visual Front End Makis Potamianos
Active Appearance ModelVisual Features Iain Matthews
Acknowledgments • Cootes, Edwards, Talyor, Manchester • Sclaroff, Boston
AAM Overview Appearance Region of interest Warp to reference Shape & Appearance Landmarks Shape
Relationship to DCT Features Face Detector DCT AAM Tracker AAMFeatures • External feature detector vs. model-based learned tracking • ROI ‘box’ vs. explicit shape + appearance modeling
Training Data • 4072 hand labelled images = 2m 13s (/ 50h)
Final Model 3 mean 3
Fitting Algorithm Current model projection Current model projection Image under model Image under model Difference Difference Iterate until convergence a dc Appearance Appearance Warp to reference Warp to reference Error weight c is all model parameters dc PredictedUpdate Image
Tracking Results • Worst sequence - mean, mean square error = 548.87 • Best sequence - mean, mean square error = 89.11
Tracking Results • Full-face AAM tracker on subset of VVAV database • 4,952 sequences • 1,119,256 images @ 30fps = 10h 22m • Mean, mean MSE per sentence = 254.21 • Tracking rate (m2p decode) 4 fps • Beard area and lips only models will not track • Regions lack sharp texture gradients needed locate model?
Features • Use AAM full-face features directly (86 dimensional)
Audio Lattice Rescoring Results Lattice random path = 78.14% DCT with LM = 51.08% DCT no LM = 61.06%
Audio Lattice Rescoring Results • AAM vs. DCT vs. Noise
Tracking Errors Analysis • AAM vs. Tracking error
Analysis and Future Work • Models are under trained • Little more than face detection on 2m of training
Analysis and Future Work reproject • Models are under trained • Little more than face detection on 2m of training • Project face through a more compact model • Retain only useful articulation information?
Analysis and Future Work reproject • Models are under trained • Little more than face detection on 2m of training • Project face through a more compact model • Retain only useful articulation information? • Improve the reference shape • Minimal information loss through the warping?
Asynchronous Stream Modelling Juergen Luettin
The Recognition Problem M: word (phoneme) sequence M*: most likely word sequence OA: acoustic observation sequence OV: visual observation sequence
Integration at the Feature Level • Assumption: • conditional dependence between modalities • integration at the feature level
Integration at the Decision Level • Assumption: • conditional independence between modalities • integration at the unit level
Multiple Synchronous Streams • Assumption: • conditional independence • integration at the state level Two streams in each state: X: state sequence aij : trans. prob. from i to j bj: probability density cjm: mth mixture weight of multivariate GaussianN
Multiple Asynchronous Streams • Assumption: • conditional independence • integration at the unit level Decoding: individual best state sequences for audio and video
Composite HMM definition 4 7 9 8 2 5 1 3 6 Speech-noise decomposition (Varga & Moore, 1993) Audio-visual decomposition (Dupont & Luettin, 1998)
AVSR System • 3-state HMM with 12 mixture components, 7-state HMM for composite model • context dependent phone models (silence, short pause), tree-based state clustering • cross-word context dependent decoding, using lattices computed at IBM • trigram language model • global stream weights in multi stream models, estimated on held out set
Conclusions • AV 2 Stream asynchronous model beats other models in noisy conditions Future directions: • Transition matrices: context dependent, pruning transitions with low probability, cross-unit asynchrony • Stream weights: model based, discriminative • Clustering: taking stream-tying into account
Phone Dependent Weighting Dimitra Vergyri
Weight Estimation Hervé Glotin