400 likes | 601 Views
Auditory and Visual Spatial Sensing. Stan Birchfield Department of Electrical and Computer Engineering Clemson University. Human Spatial Sensing. The five senses:. Seeing. Hearing. f(x,y, l ,t). f(t). Taste. Smell. Touch. Visual and Auditory Pathways.
E N D
Auditory and Visual Spatial Sensing Stan Birchfield Department of Electrical and Computer Engineering Clemson University
Human Spatial Sensing The five senses: Seeing Hearing f(x,y,l,t) f(t) Taste Smell Touch
Two Problems inSpatial Sensing Stereo Vision Acoustic Localization
Clemson Vision Laboratory head tracking highway monitoring root detection reconstruction motion segmentation
Clemson Vision Lab (cont.) microphone position calibration speaker localization
Stereo Vision epipolar constraint INPUT Left Right OUTPUT Disparity map Depth discontinuities
Epipolar Constraint world point epipolar line epipolar plane center of projection Left camera Right camera
Energy Minimization Left occluded pixels intensity Right constraint (underconstrained) minimize: discontinuity penalty dissimilarity
History of Stereo Correspondence Birchfield & Tomasi 1998 Geiger et al. 1995 Intille &Bobick 1994 Belhumeur & Mumford 1992 Ohta & Kanade 1985 Baker & Binford 1981 Kolmogorov & Zabih 2001, 2002 Lin & Tomasi 2002 Birchfield & Tomasi 1999 Boykov, Veksler, and Zabih 1998 Roy & Cox 1998 DYNAMIC PROGRAMMING (1D) MULTIWAY-CUT (2D)
c a t c a r t Dynamic Programming: 1D Search c a r t 0 1 2 3 4 penalties: mismatch = 1 insertion = 1 deletion = 1 c 1 0 1 2 3 string editing: a 2 1 0 1 2 3 2 1 1 1 t occlusion RIGHT Disparity map stereo matching: LEFT depth discontinuity
Multiway-Cut:2D Search labels labels pixels pixels [Boykov, Veksler, Zabih 1998]
source label minimum cut sink label Multiway-Cut Algorithm labels pixels pixels (cost of label discontinuity) (cost of assigning label to pixel) Minimizes
Sampling-InsensitivePixel Dissimilarity d(xL,xR) IL IR xL xR Our dissimilarity measure: d(xL,xR) = min{d(xL,xR) ,d(xR,xL)} [Birchfield & Tomasi 1998]
Dissimilarity Measure Theorems Given: An interval A such that [xL – ½ , xL + ½] _ A, and [xR – ½ , xR + ½] _ A If | xL – xR | ≤ ½, then d(xL,xR) = 0| xL – xR | ≤ ½ iff d(xL,xR) = 0 ∩ ∩ Theorem 1: (when A is convex or concave) Theorem 2: (when A is linear)
Correspondence as Segmentation • Problem: disparities (fronto-parallel) O(D)surfaces (slanted) O(Ds2 n)=> computationally intractable! • Solution: iteratively determine which labels to use find affine parameters of regions label pixels multiway-cut (Expectation) Newton-Raphson (Maximization)
Stereo Results on Middlebury Database image BirchfieldTomasi 1999 Hong-Chen 2004
Multiway-Cut Challenges Dynamic programming Multiway-cut
Acoustic Localization distributed compact Problem: Use microphone signals to determine sound source location • Traditional solutions: • Delay-and-sum beamforming ! • Time-delay estimation (TDE) ! • Recent solutions: • Hemisphere sampling !! • Accumulated correlation !! • Bayesian ! • Zero-energy ! ! efficient! accurate
t - t = t 2 1 Localization Geometry sound source t 1 t 2 t time microphones (one-half hyperboloid)
Principle of Least Commitment “Delay decisions as long as possible” Example: [Marr 1982 Russell & Norvig 1995]
Localization by Beamforming mic 1 signal makes decision late in pipeline (“principle of least commitment”) delay prefilter mic 2 signal delay prefilter q,f find peak sum energy mic 3 signal delay prefilter mic 4 signal delay prefilter delays (shifts) each signal for each candidate location [Silverman &Kirtman 1992; Duraiswami et al. 2001; Ward & Williamson, 2002] ! accurateNOT efficient
Localization by Time-Delay Estimation (TDE) decision is made early mic 1 signal prefilter find peak correlate mic 2 signal prefilter q,f intersect (may be no intersection) mic 3 signal prefilter find peak correlate mic 4 signal prefilter cross-correlation computed once for each microphone pair [Brandstein et al. 1995; Brandstein & Silverman 1997; Wang & Chu 1997] ! efficient NOTaccurate
Localization by Hemisphere Sampling map to common coordinate system mic 1 signal prefilter correlate sampled locus mic 2 signal prefilter correlate final sampled locus … correlate q,f find peak sum correlate correlate temporal smoothing map to common coordinate system mic 3 signal prefilter correlate mic 4 signal prefilter ! efficient ! accurate (but restricted to compact arrays) [Birchfield & Gillmor 2001]
Localization by Accumulated Correlation map to common coordinate system mic 1 signal prefilter correlate sampled locus mic 2 signal prefilter correlate final sampled locus … correlate q,f find peak sum correlate correlate temporal smoothing map to common coordinate system mic 3 signal prefilter correlate mic 4 signal prefilter ! efficient ! accurate [Birchfield & Gillmor 2002]
pair 1: + pair 2: + ... = likelihood Accumulated Correlation Algorithm candidate location microphone
accurate efficient Comparison Beamforming: energy similarity Bayesian: Zero energy: Acc corr: Hem samp: TDE:
accurate efficient Unifying framework
Integration limits Beamforming Bayesian Zero energy Accumulated correlationHemisphere sampling Time-delay estimation
Compact Microphone Array microphone sampled hemisphere d=15cm
Results on compact array pan tilt without PHAT prefilter with PHAT prefilter
More Comparison Accumulated Correlation Beamforming [Birchfield & Gillmor 2002] Hemisphere Sampling [Birchfield & Gillmor 2001]
Computational efficiency Computing time per window (ms) (600x faster) (50x faster)
Detecting Noise Sources background noise source
Connection with Stereo “Multi-baseline stereo” [Okutomi & Kanade 1993]
Conclusion • Spatial sensing achieved by arrays of visual and auditory sensors • Stereo vision • match visual signals from multiple cameras • recent breakthrough: multiway-cut • limitations of multiway-cut • Acoustic localization • match acoustic signals from multiple microphones • recent breakthrough: accumulated correlation • connection with multi-baseline stereo