120 likes | 136 Views
Intelligent Perceptual Interfaces. Trevor Darrell Eric Grimson. Perceptual User Interfaces. Interactivity- be aware of viewer! People have natural interface modes Watch, listen, learn signals from user Key concepts: Transparency - embedded interfaces
E N D
Intelligent Perceptual Interfaces Trevor Darrell Eric Grimson
Perceptual User Interfaces • Interactivity- be aware of viewer! • People have natural interface modes • Watch, listen, learn signals from user • Key concepts: • Transparency - embedded interfaces • Expressiveness - balance interface I/O bandwidth • Key technologies: • Computer Vision • Machine Learning • Spoken Language Understanding
A Face Responsive Display • Faces are natural interfaces! • Ubiquitous, fast, expressive, general. • Want machines to generate and perceive faces. • A Face Responsive Display... • Knows when it’s being observed • Recognizes returning observers • Tracks head pose • Recognizes speech without attached microphone • Robust to changing lighting, moving backgrounds…
Head Pose Estimation • Estimate gaze angle of user’s head • Rigid body model: 6 DOF
Lip Contour Tracking Conventional Intelligent shape tracking
Untethered Audio-Visual Interface • Current audio interfaces often require attached microphone — future systems need wireless interface • Common approaches • Beam-forming microphone • Active narrow-field microphone • New idea — exploit joint statistics of audio and visual information • Correlation / mutual information between audio and image pixels can separate sources
Audio-based Image Localization Can we locate visual sources given audio information? Original Sequence
Audio-based Image Localization Image variance (ignoring audio) will find all motion in the sequence: Image Variance
Audio-based Image Localization Examine Mutual Information (Correlation in simplest case) between image and audio: Pixels with high mutual information with audio track
Learning an Informative Subspace Learned Subspace video projection audio projection Find a projection of both the video data and the audio data to a low-dimensional space so that MI is maximized.