100 likes | 221 Views
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10. Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr /projects/muscle. Researchers: P. Maragos, S. Kollias (Faculty members)
E N D
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr/projects/muscle
Researchers: P. Maragos, S. Kollias (Faculty members) G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis, I. Kokkinos (PhD GRA) G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition Face Detection, Modeling & Tracking AV Feature Extraction, Fusion, Dynamic Models for AV-ASR AV to Articulatory Speech Inversion (WP6) E-team 2: Audio-Visual Understanding Audio-Visual Salient Event Detection, Integrated Multimedia Content Analysis ICCS-NTUA: E-team Researchers & Directions WP6 E-teams: 8-12-2005
AV-ASR Front-End Feature Transform./ Selection Speech • Modulations – Energy • Multiband Filtering • Nonlinear Processing • Demodulation M-Array Processing Fusion • Dynamics - Fractals • Embedding • Geometrical Filtering • Fractal Dimensions Feature Stream • Visual • Active Appearance Model • Face Detection/Tracking • Mouth R.O.I. Features MFCC VAD Speaker Normalization WP6 E-teams: 8-12-2005
Audiovisual ASR: Face Modeling • A well studied problem in Computer Vision: • Active Appearance Models, Morphable Models, Active Blobs • Both Shape & Appearance can enhance lipreading • The shape and appearance of human faces “live” in low dimensional manifolds = = WP6 E-teams: 8-12-2005
Image Fitting Example step 2 step 6 step 10 step 14 step 18 WP6 E-teams: 8-12-2005
Example: Face Interpretation Using AAM shape track superimposed on original video reconstructed face This is what the visual-only speech recognizer “sees”! original video • Generative models like AAM allow us to evaluate the output of the visual front-end WP6 E-teams: 8-12-2005
Joint Image Segmentation and Object Detection via the Expectation Maximization algorithm • Generative models ‘compete’ for image observations • Segmentation translates into the assignment of image observations into one of K models (image labelling) • Segmentation labels are treated like hidden data • EM algorithm: • Ε-step: use current parameter estimates to assign micro-segments to objects • M-step use assignment probabilities to derive optimal model parameters • Active Appearance Models used as generative • models for the object categories of cars and faces WP6 E-teams: 8-12-2005
Top-Down Segmentation Results • Thresholding the E-step we get a hard figure-ground segmentation • No ‘shape-prior’ knowledge is necessary for the segmentation • generative model contains information about shape variation • Combination of bottom-up & top-down detection On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object. WP6 E-teams: 8-12-2005
Spatio-Temporal Visual Attention I: Video Analysis • Create video volume • Feature extraction from spatiotemporal data • Fusion & saliency generation
Spatio-Temporal Visual Attention II:Classification & segmentation • Use spatiotemporal VA for efficient global classification of videos • Claim: features extracted only from low or high saliency regions are more representative of the input video • Foreground/Background segmentation • Claim: most salient regions are related to foreground areas of the video WP6 E-teams: 8-12-2005