1 / 20

Exploiting cross-modal rhythm for robot perception of objects

m. Exploiting cross-modal rhythm for robot perception of objects. Artur M. Arsenio Paul Fitzpatrick. MIT Computer Science and Artificial Intelligence Laboratory. Cog – the humanoid platform. cameras on active vision head. microphone array above torso.

Download Presentation

Exploiting cross-modal rhythm for robot perception of objects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. m Exploiting cross-modal rhythm for robot perception of objects Artur M. Arsenio Paul Fitzpatrick MIT Computer Science and Artificial Intelligence Laboratory

  2. Cog – the humanoid platform cameras on active vision head microphone array above torso periodically moving object (hammer) periodically generated sound (banging) CIRAS 2003

  3. Motivation • Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files, … • Rhythmic information across the visual and acoustic sensory modalities have complementary properties • Features extracted from visual and acoustic processing are what is needed to build an object recognition system CIRAS 2003

  4. robot Interacting with the robot CIRAS 2003

  5. Talk outline • Matching sound and vision • Matching with visual distraction • Matching with acoustic distraction • Matching multiple sources • Priming sound detection using vision • Towards object recognition CIRAS 2003

  6. Detecting periodic events • Tools are often used in a manner that is composed of some repeated motion - consider hammers, saws, brushes, files. • Points tracked using Lukas-Kanade algorithm • Periodicity Analysis • FFTs of tracked trajectories • Periodicity Histograms • Phase verification CIRAS 2003

  7. Matching sound and vision 8 6 4 frequency (kHz) 2 0 0 500 1000 2000 1500 1500 1000 energy 500 0 0 500 1000 1500 2000 2500 -50 • The sound intensity peaks once per visual period of the hammer -60 hammer position -70 -80 0 500 1000 1500 2000 2500 time (ms) CIRAS 2003

  8. One object (the car) making noise Another object (the ball) in view Problem: which object goes with the sound? Solution: Match periods of motion and sound Matching with visual distraction CIRAS 2003

  9. 10 energy 5 0 0 500 1000 1500 2000 2500 50 0 car position -50 0 500 1000 1500 2000 2500 60 ball position 50 40 0 500 1000 1500 2000 2500 time (ms) Comparing periods • The sound intensity peaks twice per visual period of the car CIRAS 2003

  10. 8 6 frequency (kHz) 4 2 0 0 500 1000 1500 2000 90 80 car position 70 12 60 10 50 snake position Sound 500 1000 1500 2000 8 6 4 500 1000 1500 2000 2500 Matching with acoustic distraction Matching with acoustic distraction CIRAS 2003

  11. Matching multiple sources • Two objects making sounds with distinct spectrums • Problem: which object goes with which sound? • Solution: Match periods of motion and sound CIRAS 2003

  12. 40 35 rattle position 30 8 200 400 600 800 1000 1200 1400 1600 1800 2000 6 -20 4 2 frequency (kHz) -40 0 car position 0 500 1000 1500 2000 -60 200 400 600 800 1000 1200 1400 1600 1800 2000 Binding periodicity features • The sound intensity peaks twice per visual period of the car. For the cube rattle, the sound/visual signals have different ratios according to the frequency bands CIRAS 2003

  13. Experiment Visual period found Sound period found Bind made Correct binds (%) Incorrect Binds (%) Missed Binds (%) hammer 6 8 6 100 0 0 Car & ball 7 11 1 14 0 86 Car 6 6 5 83 0 17 Car (Snake rattle in background) 5 16 5 100 0 0 Snake rattle (Car in Background) 4 9 4 100 0 0 Plane and Mouse 23 27 16 46 41 13 Statistics An evaluation of cross-modal binding for various objects and situations the sound generated by a periodically moving object can be much more complex and ambiguous than its visual trajectory CIRAS 2003

  14. 8 6 frequency (kHz) 4 2 0 0 200 400 600 800 time (ms) 8 6 frequency (kHz) 4 2 0 0 200 400 600 time (ms) Priming sound detection using vision Signals in Phase CIRAS 2003

  15. 8 6 frequency (kHz) 4 2 0 0 200 400 600 800 time (ms) 8 6 frequency (kHz) 4 2 0 0 200 400 600 time (ms) Signals out of phase! CIRAS 2003

  16. Object recognition • Visual object segmentation • Cross-modal object recognition • Ratio between acoustic/visual fundamental frequencies • Phase between acoustic and visual signals • Range of acoustic frequency bands CIRAS 2003

  17. Cross-modal object recognition Causes sound when changing direction after striking object; quiet when changing direction to strike again Causes sound while moving rapidly with wheels spinning; quiet when changing direction Causes sound when changing direction, often quiet during remainder of trajectory (although bells vary) CIRAS 2003

  18. 100 50 0 Phase difference (sound/visual) 15 -50 10 -100 100 Frequency bands 5 50 -150 0 0 0.4 -50 -200 0.5 0.4 0.5 0.6 0.7 0.8 0.9 1 -100 0.6 0.7 Fundamental period Ratio (sound/visual) -150 0.8 Phase difference (sound/visual) 0.9 -200 1 Fundamental period Ratio (sound/visual) Clustering CIRAS 2003

  19. Conclusions • Different objects = distinct acoustic-visual patterns which are a rich source of information for object recognition. Object differentiation from both its visual and acoustic backgrounds by binding pixels and frequency bands that are oscillating together • Cognitive evidence that, for humans, simple visual periodicity can aid the detection of acoustic periodicity • More feature can be used for better discrimination, like the ratio of the sound/visual peak amplitudes • Each type of features are important for recognition when the other is absent. But when both are present, then we can do better by looking at the relationship between visual motion and the sound generated. CIRAS 2003

  20. Questions? Questions? CIRAS 2003

More Related