Self Localizing sensors and actuators on Distributed Computing Platforms

Self Localizing sensors and actuators on Distributed Computing Platforms Vikas Raykar Igor Kozintsev Rainer Lienhart

Motivation • Many multimedia applications are emerging which use multiple audio/video sensors and actuators. Microphones Speakers Distributed Capture Distributed Rendering Cameras Number Crunching Displays Other Applications

Applications Speech Recognition Smart Conference Rooms Meeting Recording Source separation and Deverberation Hands free voice communication Object Localization And tracking Audio/Image Based Rendering MultiChannel Speech Enhancement Distributed Audio Video Capture MultiChannel Echo Cancellation Audio/Video Surveillance Interactive Audio Visual Interfaces

Additional Motivation • Current work has focused on setting up all the sensors and actuators on a single dedicated computing platform. • Dedicated infrastructure required in terms of the sensors, multi-channel interface cards and computing power. • Computing devices such as laptops, PDAs, tablets, cellular phones, camcorders have become pervasive. • Audio/video sensors on different laptops can be used to form a distributed network of sensors. On the other hand…

Problem formulation • Put all the distributed audio-visual I/O capabilities into a common time and space. • In this paper: • Focus on providing a common space by means of actively estimating the 3D positions of the sensors (microphones) and actuators (speakers). • Account for the errors due to lack of temporal synchronization among various sensors and actuators (A/Ds and D/As) on distributed general purpose computing platforms.

Our View of Distributed Sensor Network Y Z X

Localization with known positions of speakers Distances are not exact There are more speakers

If positions of speakers are unknown… • Consider M Microphones and S speakers. • What can we measure? Distance between each speaker and all microphones (Time Of Flight) MxS TOF matrix Assume TOF corrupted by AWGN: can derive the ML estimate. Calibration signal

Nonlinear Least Squares Find the coordinates which minimizes this

Reference Coordinate System Positive Y axis Similarly in 3D 1.Fix origin (0,0,0) 2.Fix X axis (x1,0,0) 3.Fix Y axis (x2,y2,0) 4.Fix positive Z axis x1,x2,y2>0 Origin X axis Which to choose? Later…

On a synchronized platform all is well..

However on a Distributed system..

CPU AGP MCH ICH LAN PCI Slots AC97 ATA USB PC platform overview CPU, MCH, FSB, memory Multimedia/multistream applications Operating system ICH, hub, PCI, LAN, etc. I/O bus Audio/video I/O devices

Timing on distributed system Time Origin Signal Emitted by source j t Playback Started Signal Received by microphone i Capture Started t

Joint Estimation MS TOF Measurements Microphone and speaker Coordinates DM+DS - [ D(D+1)/2 ] Speaker Emission Start Times S Microphone Capture Start Times M -1 Assume tm_1=0

Time Difference of Arrival (TDOA) Formulation same as above but less number of parameters.

Nonlinear least squares Levenberg Marquadrat method Multidimensional function. Unless we have a good initial guess may not converge to the global minima. Approximate initial guess required.

Multi Dimensional Scaling dot product matrix Symmetric positive definite rank 3 Given B can you get X ?....Singular Value Decomposition

Clustering approximation

Clustering approximation i i i j j i j j

k How to get dot product from the pair wise distance matrix i j

Centroid as the origin Later shift it to our orignal reference Slightly perturb each location of GPC into two to get the initial guess for the microphone and speaker coordinates

Sample result in 2D

Algorithm TOF matrix Clustering Approx ts Approx Distance matrix between GPCs Dot product matrix Approx tm Dimension and coordinate system MDS to get approx GPC locations TDOA based Nonlinear minimization perturb Microphone and speaker locations tm Approx. microphone and speaker locations

Cramer-Rao bound • Gives the lower bound on the variance of any unbiased estimator. • Does not depends on the estimator. Just the data and the noise model. • Basically tells us to what extent the noise limits our performance i.e. you cannot get a variance lesser than the CR bound. Rank deficit: remove the known parameters Jacobian

Performance comparison

Dependence on number of nodes

Geometry matters

X Room Height = 2.03 m Speaker 3 Mic 3 Mic 4 Room Length = 4.22 m Speaker 2 Speaker 4 Mic 2 Mic 1 Speaker1 Z Room Width = 2.55 m Experimental setup: bias 0.08 cm sigma 3.8 cm

Summary • General purpose computers can be used for distributed array processing • It is possible to define common time and space for a network of distributed sensors and actuators. • For more information please see our two papers in ACM MM in November or contact igor.v.kozintsev@intel.com rainer.lienhart@intel.com • Let us know if you will be interested in testing/using out time and space synchronization software for developing distributed algorithms on GPC (available in November)

Backup

Calibration signal

Results (contd.)

Self Localizing sensors and actuators on Distributed Computing Platforms