150 likes | 166 Views
Binaural Sonification of Disparity Maps. Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí. Contents. Project description Estimation of disparity maps Segmentation of disparity maps Object sonification Test application
E N D
Binaural Sonificationof Disparity Maps Alfonso Alba, Carlos Zubieta, Edgar Arce Facultad de Ciencias Universidad Autónoma de San Luis Potosí
Contents • Project description • Estimation of disparity maps • Segmentation of disparity maps • Object sonification • Test application • Preliminary results • Future work
Project description • Thegoal of thisprojectistodevelop a scenesonificationsystemforthevisuallyimpaired. • Imagesfrom a stereo camera pairwillbeusedtodetectobjects in thescene and estimatethedistancebetweenthem and thesubject. • A binaural audio signalwillbesynthesizedforeachobject, so thatthesubject can “hear” theobjects in thescene in theircorrespondinglocations.
Scene sonification system • Thesystemwillconsist of thefollowingstages: • Stereoimageacquisition • Disparitymapestimation • Disparitymapsegmentation (objectdetection) • Binauralsonification of objects in thescene • Herewewillfocusonlyonthesegmentation of a givendisparitymap, and sonificationstages.
Estimation of disparity maps • Imagesfrom a pair of cameras, separatedby a certaindistance, form a stereoimagepair. • The position of a certainobject in one of theimageswillbeshifted in theotherimagebyanamountinverselyproportionaltothedistancebetweentheobject and the camera arrangement. • Thisdisplacementiscalleddisparity, and can becomputedforeach pixel toform a disparitymap. • We are currentlyworkingon a techniqueto compute disparitymaps in realtime.
Segmentation of disparity maps • Given a disparitymapD(x,y), weperform a seededregion-growingsegmentationtodetecttheobjects in thescene. • Tochoosetheseeds, thealgorithm uses a fitnessmeasuregivenbywhereN(x,y)isthe set of nearest-neighbors of (x,y), and qis a qualityparameter (increasesrobustnesstonoise). • Thismeasurefavorshomogeneousregions (lowdq)withthehighestdisparity (nearestobjects).
Region-growing algorithm 1 Take a pixel from a region’s border. 1 0 1 For each unlabeled neighbor, compare its intensity to the region’s average intensity. 1 0 1 If they are similar enough, include the neighbor in the region. 1 1 0
Object sonification • Sound coming from a specific location will suffer a series of degradations before it reaches our ears. • These degradations provide various cues that our brain uses to locate the sound sorce. • Binaural spatialization attempts to model these cues, in order to allow the listener to hear a sound as if it were coming from a specific point in space, which is typically defined in spherical coordinates (see below).
Object sonification • We represent each object in the scene with a ping-like sound whose frequency depends on the disparity, so that the sound becomes more alerting as the object becomes closer. • The audio signal corresponding to each object is fed through a binaural spatialization system whose parameters depend on the object’s position. • Spatialization is performed by modeling azimuth and range cues. Elevation cues have not been implemented (yet).
Azimuth cues • Inter-aural Time Difference: • The sound source is delayed by a different amount for each ear: Tn = a – a sin(q), Tf = a + aq. • Inter-aural Level Difference (head-shadow): • The sound is attenuated when passing through the head. • This cue can be modeled with a one-pole one-zero filter: Brown et al., 1998
Range cues • Artificial Reverberation: • Reverberation is the result of a large number of echoes originated from the reflection of the sound in flat surfaces such as walls. • The level of reverberation is roughly constant and independent of source location. • We use a simple model composed of 4 parallel delay lines with feedback. • Attenuation: • The audio signal is attenuated according to the inverse quadratic law. • The ratio between the signal and reverberation levels provides an additional cue for range.
Test application • We simulate a moving scene by taking a 160 x 100 sub-frame of a precomputed disparity map. • The 10 most relevant objects are segmented but only objects that are near enough are sonified.
Preliminary results • Fastsegmentation times • 5 ms per 160 x 100 frame in a 2.4 GHz dual core CPU • Over 100 frames per secondincludingsonificationstage (butwithoutdisparitymapestimation) • Embeddedimplementationis viable • Goodazimuthrepresentation: objectdirectioniseasilyperceived. • Objectrangeisperceived in a relativemanner (e.g., oneobjectisnearerthananother), butnot in anabsoluteway. • Between 3 and 5 objects can besonifiedbeforetoomuchclutterisheard.
Future work • Camera setup and calibration • Realtime estimation of disparity maps • Elevation cues in binaural spatialization • Optimization of sonification system • Implementation in an embedded device