350 likes | 432 Views
Self Localizing sensors and actuators on Distributed Computing Platforms. Vikas Raykar Igor Kozintsev Rainer Lienhart. Motivation. Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Microphones. Speakers. Distributed Capture.
E N D
Self Localizing sensors and actuators on Distributed Computing Platforms Vikas Raykar Igor Kozintsev Rainer Lienhart
Motivation • Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Microphones Speakers Distributed Capture Distributed Rendering Cameras Number Crunching Displays Other Applications
Applications Speech Recognition Smart Conference Rooms Meeting Recording Source separation and Deverberation Hands free voice communication Object Localization And tracking Audio/Image Based Rendering MultiChannel Speech Enhancement Distributed Audio Video Capture MultiChannel Echo Cancellation Audio/Video Surveillance Interactive Audio Visual Interfaces
Additional Motivation • Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. • Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. • Computing devices such as laptops, PDAs, tablets, cellular phones, camcorders have become pervasive. • Audio/video sensors on different laptops can be used to form a distributed network of sensors. On the other hand…
Problem formulation • Put all the distributed audio-visual I/O capabilities into a common time and space. • In this paper: • Focus on providing a common space by means of actively estimating the 3D positions of the sensors (microphones) and actuators (speakers). • Account for the errors due to lack of temporal synchronization among various sensors and actuators (A/Ds and D/As) on distributed general purpose computing platforms.
Localization with known positions of speakers Distances are not exact There are more speakers
If positions of speakers are unknown… • Consider M Microphones and S speakers. • What can we measure? Distance between each speaker and all microphones (Time Of Flight) MxS TOF matrix Assume TOF corrupted by AWGN: can derive the ML estimate. Calibration signal
Nonlinear Least Squares Find the coordinates which minimizes this
Reference Coordinate System Positive Y axis Similarly in 3D 1.Fix origin (0,0,0) 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Origin X axis Which to choose? Later…
CPU AGP MCH ICH LAN PCI Slots AC97 ATA USB PC platform overview CPU, MCH, FSB, memory Multimedia/multistream applications Operating system ICH, hub, PCI, LAN, etc. I/O bus Audio/video I/O devices
Timing on distributed system Time Origin Signal Emitted by source j t Playback Started Signal Received by microphone i Capture Started t
Joint Estimation MS TOF Measurements Microphone and speaker Coordinates DM+DS - [ D(D+1)/2 ] Speaker Emission Start Times S Microphone Capture Start Times M -1 Assume tm_1=0
Time Difference of Arrival (TDOA) Formulation same as above but less number of parameters.
Nonlinear least squares Levenberg Marquadrat method Multidimensional function. Unless we have a good initial guess may not converge to the global minima. Approximate initial guess required.
Multi Dimensional Scaling dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition
Clustering approximation i i i j j i j j
k How to get dot product from the pair wise distance matrix i j
Centroid as the origin Later shift it to our orignal reference Slightly perturb each location of GPC into two to get the initial guess for the microphone and speaker coordinates
Algorithm TOF matrix Clustering Approx ts Approx Distance matrix between GPCs Dot product matrix Approx tm Dimension and coordinate system MDS to get approx GPC locations TDOA based Nonlinear minimization perturb Microphone and speaker locations tm Approx. microphone and speaker locations
Cramer-Rao bound • Gives the lower bound on the variance of any unbiased estimator. • Does not depends on the estimator. Just the data and the noise model. • Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Rank deficit: remove the known parameters Jacobian
X Room Height = 2.03 m Speaker 3 Mic 3 Mic 4 Room Length = 4.22 m Speaker 2 Speaker 4 Mic 2 Mic 1 Speaker1 Z Room Width = 2.55 m Experimental setup: bias 0.08 cm sigma 3.8 cm
Summary • General purpose computers can be used for distributed array processing • It is possible to define common time and space for a network of distributed sensors and actuators. • For more information please see our two papers in ACM MM in November or contact igor.v.kozintsev@intel.com rainer.lienhart@intel.com • Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPC (available in November)