1 / 35

Lip Feature Extraction and Tracking: Model-Based Approach

Learn about model-based lip feature extraction and tracking techniques for combining audio and visual cues in speech recognition systems. Explore energy function minimization, statistical shape models, and active shape models.

Download Presentation

Lip Feature Extraction and Tracking: Model-Based Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Model-Based Facial Feature Extraction Multimodal Interaction Dr. Mike Spann m.spann@bham.ac.uk http://www.eee.bham.ac.uk/spannm

  2. Contents • Introduction • Lip feature extraction and tracking • Summary

  3. Lip feature extraction and tracking • Lip feature tracking is an important in combining audio and visual cues for speech recognition systems • Typically the lip boundaries (inner/outer/both) are tracked and shape features passed to the speech recognition module • Previous approaches • Active contour model (snakes) • Energy function minimisation used to control contour shape (curvature) and local greylevel (colour) gradient • Can be dependant on weighting parameters which need to be tuned

  4. Lip feature extraction and tracking • Typically an energy function E is defined in terms of the parameterised snake v(s)=(x(s),y(s)) where s is the distance along the snake: • The first two terms represent the snake’s internal energy and control it’s tension and rigidity • The third term attracts the snake to object boundaries with high greylevel gradient • Often an additional term is added for a ‘balloon’ snake to either inflate or deflate the snake

  5. Lip feature extraction and tracking

  6. Lip feature extraction and tracking • More recent approaches to lip localisation and tracking have been model-based • A statistical shape model of the inner and outer lip contours can be built from training data • Landmarks on the contour form pointsets: • We need to align the pointsets and then build a statistical model using PCA

  7. Lip feature extraction and tracking • Pointsets of lip feature landmarks must be normalized for translation, scale and rotation • We can use a simple iterative algorithm to align to the mean pointset

  8. Lip feature extraction and tracking • PCA is based on the mean and covariance of the pointset vectors computed across the training set: • We then compute our shape model by solving the eigenvector/eigenvalue equation: • where Λis a diagonal matrix of eigenvalues :

  9. Lip feature extraction and tracking • We can represent each landmark pointset x by a corresponding shape vector b • The set of bi’s across all of the pointsets in the database represents the ithmodeof variation of the original data • We can vary each bito get realistic versions of lip shapes • Typically for the itheigenvalue λi:

  10. Lip feature extraction and tracking

  11. Lip feature extraction and tracking

  12. Lip feature extraction and tracking • An active shape model sample greylevels perpendicular to the lip contour and centred at the model points

  13. Lip feature extraction and tracking • We sample the profiles perpendicular to each model point j • Training image i then gives us a vector of greylevels gij • We concatenate all these greylevel vectors to give us a global profile vector hi • We build a statistical model out of these profile vectors to enable the main modes of variation of the profiles about the model boundaries to be computed

  14. Lip feature extraction and tracking • The weight vectors bhcan be used as a parameter in a cost function to determine how well the actual profile fits the model

  15. Lip feature extraction and tracking • The greylevels between profile vectors can be interpolated to visualise the greylevel models • Some smoothing using a median filter helps remove any artefacts of the interpolation • We can visualise several modes corresponding to the first few eigenvectors • The corresponding components of the weight vector bh can be varied according to: • For example we can set bhi to ±2√λi for i=1,2,3

  16. Lip feature extraction and tracking • Mode 1 • Global illumination differences • Mode 2 • Lower/Upper lip intensity difference • Mode 3 • Skin/lip contrast differences • Higher modes • Illumination variations, visibility of teeth and tongue etc

  17. Lip feature extraction and tracking • In order to apply an ASM search algorithm, a coarse estimate of the region of interest containing the lips region is found • Can be input interactively or computed automatically using segmentation or edge-based feature extraction algorithm • Provides an estimate of the scale of the lips • Limits the search area

  18. Lip feature extraction and tracking

  19. Lip feature extraction and tracking • In order to use the greylevel and shape models in a search algorithm we can use the greylevel model to best fit the model greylevel profile to the current greylevel profile • Shape and pose parameters can then be updated • We need a cost function which describes the fit between the model greylevel profile and the profile measured in the image at the current model position • Several statistical approaches possible • Maximizing the probability assuming Gaussian distributions • Minimizing the mean square error between the profiles

  20. Sample profile h Current model position

  21. Lip feature extraction and tracking • We can define a error function E defining the mismatch between the actual profile h measured at the current position estimate and our model profile hm: • Substituting for hm : • Typically hm would comprise only the first few modes of variation

  22. Lip feature extraction and tracking • The model is initialized with the mean shape computed over aligned shapes in the training set • Our goal is to minimize our energy function E in terms of translation vectors tx and ty, a scale parameter s and a rotation angle θalong with the profile parameter vector :

  23. Lip feature extraction and tracking • Optimization is carried out by perturbing individual parameters and evaluating their effects on the energy function E • Typically only a few (typically 10-20) shape modes are used in the search to ease the computational burden • Perturbations in bi are limited to: • For a given position of the model landmarks, the profile h is sampled and bh computed according to:

  24. Lip feature extraction and tracking • We can devise an iterative algorithm to update the pose and shape parameters sequentially based on our error measure • The algorithm alternates between ‘model space’ and ‘image space’ • The object boundary in model space is defined by the shape parameters • We can use the greylevel or colour profile information to measure the error in image space • Conversion between the two spaces is done via the pose parameters

  25. Lip feature extraction and tracking Model space - b Image space - bh

  26. Lip feature extraction and tracking • Initialize the shape parameters b to zero and image points y • 2. Generate the model point positions: 3. Find the pose parameters tx,ty, s, θ to best fit the model points to the image points y • Project the model points into the image frame • x->T(x), compute the image profile vector h and at each projected model point, search normal to the model boundary and find the image points y’ which minimize E to produce new image profile vector h’

  27. Lip feature extraction and tracking 6. Project the image points y’ into the model coordinate frame by inverting the transformation T 7. Update the model parameters 8. If not converged y->y’. Go to step 2

  28. Lip feature extraction and tracking Model point Nearest image point to model point Image boundary

  29. Lip feature extraction and tracking • Its easier to track the outer lips than the inner ones • More constant greylevel profile • Easier to model for example with application to active shape modelling • But, less appropriate for lip gesture recognition and speech recognition algorithms • Often using a full appearance model rather than just a shape model gives better speech recognition performance • For example the teeth and tongue appearance give clues to particular types of vocal sounds

  30. Lip feature extraction and tracking • Results of off centre initialization of ASM using local greylevel profiles after 5, 10, 20, iterations

  31. Lip feature extraction and tracking • Results using ASM search with local greylevel profiles

  32. Lip feature extraction and tracking

  33. Lip feature extraction and tracking • Demo • http://www.ee.surrey.ac.uk/Projects/M2VTS/experiments/lip_tracking/index.html

  34. Summary • We have looked at a shape model and a model describing greylevel or colour variation local to the shape model landmark positions can be used for finding the lip contour location in face images • We have described an iterative model-based search algorithm for lip contour location • We have shown lip tracking results based on this algorithm

More Related