200 likes | 309 Views
m. Exploiting cross-modal rhythm for robot perception of objects. Artur M. Arsenio Paul Fitzpatrick. MIT Computer Science and Artificial Intelligence Laboratory. Cog – the humanoid platform. cameras on active vision head. microphone array above torso.
E N D
m Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory
Cog – the humanoid platform cameras on active vision head microphone array above torso periodically moving object (hammer) periodically generated sound (banging) CIRAS 2003
Motivation • Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, … • Rhythmic information across the visual and acoustic sensory modalities have complementary properties • Features extracted from visual and acoustic processing are what is needed to build an object recognition system CIRAS 2003
robot Interacting with the robot CIRAS 2003
Talk outline • Matching sound and vision • Matching with visual distraction • Matching with acoustic distraction • Matching multiple sources • Priming sound detection using vision • Towards object recognition CIRAS 2003
Detecting periodic events • Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files. • Points tracked using Lukas-Kanade algorithm • Periodicity Analysis • FFTs of tracked trajectories • Periodicity Histograms • Phase verification CIRAS 2003
Matching sound and vision 8 6 4 frequency (kHz) 2 0 0 500 1000 2000 1500 1500 1000 energy 500 0 0 500 1000 1500 2000 2500 -50 • The sound intensity peaks once per visual period of the hammer -60 hammer position -70 -80 0 500 1000 1500 2000 2500 time (ms) CIRAS 2003
One object (the car) making noise Another object (the ball) in view Problem: which object goes with the sound? Solution: Match periods of motion and sound Matching with visual distraction CIRAS 2003
10 energy 5 0 0 500 1000 1500 2000 2500 50 0 car position -50 0 500 1000 1500 2000 2500 60 ball position 50 40 0 500 1000 1500 2000 2500 time (ms) Comparing periods • The sound intensity peaks twice per visual period of the car CIRAS 2003
8 6 frequency (kHz) 4 2 0 0 500 1000 1500 2000 90 80 car position 70 12 60 10 50 snake position Sound 500 1000 1500 2000 8 6 4 500 1000 1500 2000 2500 Matching with acoustic distraction Matching with acoustic distraction CIRAS 2003
Matching multiple sources • Two objects making sounds with distinct spectrums • Problem: which object goes with which sound? • Solution: Match periods of motion and sound CIRAS 2003
40 35 rattle position 30 8 200 400 600 800 1000 1200 1400 1600 1800 2000 6 -20 4 2 frequency (kHz) -40 0 car position 0 500 1000 1500 2000 -60 200 400 600 800 1000 1200 1400 1600 1800 2000 Binding periodicity features • The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands CIRAS 2003
Experiment Visual period found Sound period found Bind made Correct binds (%) Incorrect Binds (%) Missed Binds (%) hammer 6 8 6 100 0 0 Car & ball 7 11 1 14 0 86 Car 6 6 5 83 0 17 Car (Snake rattle in background) 5 16 5 100 0 0 Snake rattle (Car in Background) 4 9 4 100 0 0 Plane and Mouse 23 27 16 46 41 13 Statistics An evaluation of cross-modal binding for various objects and situations the sound generated by a periodically moving object can be much more complex and ambiguous than its visual trajectory CIRAS 2003
8 6 frequency (kHz) 4 2 0 0 200 400 600 800 time (ms) 8 6 frequency (kHz) 4 2 0 0 200 400 600 time (ms) Priming sound detection using vision Signals in Phase CIRAS 2003
8 6 frequency (kHz) 4 2 0 0 200 400 600 800 time (ms) 8 6 frequency (kHz) 4 2 0 0 200 400 600 time (ms) Signals out of phase! CIRAS 2003
Object recognition • Visual object segmentation • Cross-modal object recognition • Ratio between acoustic/visual fundamental frequencies • Phase between acoustic and visual signals • Range of acoustic frequency bands CIRAS 2003
Cross-modal object recognition Causes sound when changing direction after striking object; quiet when changing direction to strike again Causes sound while moving rapidly with wheels spinning; quiet when changing direction Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary) CIRAS 2003
100 50 0 Phase difference (sound/visual) 15 -50 10 -100 100 Frequency bands 5 50 -150 0 0 0.4 -50 -200 0.5 0.4 0.5 0.6 0.7 0.8 0.9 1 -100 0.6 0.7 Fundamental period Ratio (sound/visual) -150 0.8 Phase difference (sound/visual) 0.9 -200 1 Fundamental period Ratio (sound/visual) Clustering CIRAS 2003
Conclusions • Different objects = distinct acoustic-visual patterns which are a rich source of information for object recognition. Object differentiation from both its visual and acoustic backgrounds by binding pixels and frequency bands that are oscillating together • Cognitive evidence that, for humans, simple visual periodicity can aid the detection of acoustic periodicity • More feature can be used for better discrimination, like the ratio of the sound/visual peak amplitudes • Each type of features are important for recognition when the other is absent. But when both are present, then we can do better by looking at the relationship between visual motion and the sound generated. CIRAS 2003
Questions? Questions? CIRAS 2003