420 likes | 547 Views
Information Theoretic Approaches to data Association and Fusion in Sensor Networks. John Fisher, Alexander Ihler, Jason Williams , Alan Willsky MIT CSAIL/LIDS Haixiao Cai, Sanjeev Kulkarni, Sergio Verdu Princeton University SensorWeb MURI Review Meeting September 22, 2003.
E N D
Information Theoretic Approaches to data Association and Fusion in Sensor Networks John Fisher, Alexander Ihler, Jason Williams , Alan Willsky MIT CSAIL/LIDS Haixiao Cai, Sanjeev Kulkarni, Sergio Verdu Princeton University SensorWeb MURI Review Meeting September 22, 2003
Problem/Motivation • Large number of simple, myopic sensors. • Need to perform local fusion to support global inference (Battlespace Awareness). • Critical need to understand statistical relationships between sensor outputs in the face of many modes of uncertainty (sensors, scene, geometry, etc).
Challenges • Uncertainty in scene and sensor geometry • Complex, dynamic environment • Uncalibrated, multi-modal sensors • Unknown joint sensor statistics • Need fast, low-complexity algorithms
Activity and Accomplishments • Research • Application of data association method to multi-modal (A/V) correspondence problem. • A/V is a surrogate for other modalities primarily because we can easily collect this data (vs. IR, EM, etc.). • Extensions and empirical results to multi-modal feature-aided tracking. • Generalization of data association to triangulated graphs. • Improved K-L Divergence/MI estimators. • New developments on applied information-theoretic sensor management.
Activity and Accomplishments • Tech Transition • ARL visits • Student (Ihler) on-site at ARL • Plans to transition Data Association method to DARPA’s CTS program (Ft. Belvoir installation) • Publications • 4 conference publications • IPSN (2) • ICME (invited) • ICASSP (invited) • 1 journal submission • accepted pending 2nd review • 3 Sensor Network workshop panels • ARO, NSF, SAMSI
A Common Thread • Fusion and correspondence are difficult given the types of sensor uncertainties we are facing. • Various information theoretic measures and the need to estimate them arise naturally in such problems. • Exploiting sensor data subject to a common excitation provides a mechanism for estimating such quantities.
Overview • Estimating Information Theoretic Measures from Sensor Data (MIT, Princeton) • Applications • Data Association, Multi-modal Tracking, Inferring Group Interactions, Sensor Management • Future Directions • Information driven sensor fusion
Measurements: Separated signals Direction of arrival 1 signal/2 sensors Localize >2 signals, 2 sensors Ambiguous B2 A1 A2 B1 Sensor B Sensor A Data Association (last year)
Association as a Hypothesis Test • Assuming independent sources, hypotheses are of the form • Asymptotic comparison of known models to those estimated from a single realization
Decomposes into two sets of terms: Statistical dependencies (groupings) Differences in model parameterizations Asymptotics of Likelihood Ratio
If we estimate from a single realization: Statistical dependence terms remain Model divergences go away Asymptotics of Likelihood Ratio
High Dimensional Data • Learn low-dimensional auxiliary variables which summarize statistical dependency of measurements
New since last year, direct application of the 2 sensors/multiple source case Unknown joint statistics High-dimensional data Varying scene parameters Surrogate for multi-modal sensors AV Association/Correspondence consistent inconsistent
association matrix for 8 subjects AV Association/Correspondence 0.68 0.61 0.19 0.20
association matrix for 8 subjects AV Association/Correspondence 0.68 0.61 0.19 0.20
General Structure Tests • Generalization to hypothesis tests over graphical structures • How are observations related to each other? vs vs
H2 H1 General Structure Tests Intersection Sets - groupings on which the hypotheses agree vs
General Structure Tests • Asymptotics have a similar decomposition as in the 2-variable case (via the intersection sets):
General Structure Tests • Extension of the previous work on data association is straightforward for such tests. • Estimation from a single realization incurs a reduction in separability only in terms of the model difference terms. • The “curse of dimensionality” (with respect to density estimation) arises in 2 ways: • Individual measurements may be of high dimension • Could still design low dimensional auxiliary variables • The number of variables in a group • New results provide a solution
The test implies potentially 6 joint densities, but is simplified by looking at the intersection sets. H2 H1 General Structure Tests
High dimensional variables learning auxiliary variables reduces dimensionality in one aspect. • But we would still have to estimate a 3 dimensional density. • This only gets worse with larger groupings. General Structure Tests
K-L Divergence with Permutations • Simple idea which mitigates many of the dimensionality issues. • Exploits the fact that the structures are distinguished by their groupings of variables. • Key Ideas: • Permuting sample order between groupings maintains the statistical dependency structure. • D(X||Y) >= D(f(X)||f(Y)) • This has the advantage that we can design a single (possibly vector-valued) function of all variables rather than one function for each variable. • Currently doing comparitive analysis (bias, variance) with previous approach.
f K-L Divergence with Permutations
More General Structures • Analysis has been extended to comparisons between triangulated graphs. • Can be expressed as sums and differences of product terms. • Admits a wide class of Markov processes.
Modeling Group Interactions • Object 3 tries to interpose itself between objects 1 and 2. • The graph describes the state (position) dependency structure.
Previous Work and Current Efforts (Princeton) • Developed fast algorithms based on block sorting for entropy and divergence estimation for discrete sources. • Simulations and text data show excellent results. • Have provided analysis of methods showing universal consistency. • Have recently investigated estimation of mutual information. • Have recently been investigating performance for hidden Markov sources. • Currently analyzing performance for hidden Markov sources. • Investigating extensions to continuous alphabet sources. • Applications to various types of data.
A “Distilled” Problem • The Problem: How to estimate the entropy, divergence, and mutual information of two sources based only on one realization from each source ? • Assumption: Both are finite-alphabet, finite- memory, stationary sources. • Our goal: Want good estimates, fast convergence, and reasonable computational complexity.
Two Approaches to Estimating Mutual Information • Estimate mutual information via entropy: I(X;Y) = H(X) + H(Y) – H(X,Y). • Estimating mutual information via divergence: I(X;Y)= D(pxy||pxpy). • We use our entropy and divergence estimators via Burrows-Wheeler Block Sorting Transform.
Estimating Mutual Information • Analysis and simulations shows that both approaches converge to the true value. • Entropy approach appears better than the divergence approach. • Divergence approach does not use the fact that the second distribution pxpy is a product of two marginal distributions.
Hidden Markov Processes • X is the underlying Markov Chain. • Y is a deterministic mapping of X, or Y is X observed through a Discrete Memoryless Channel. Then, Y is a Hidden Markov Process. Useful in a wide range of applications.
Entropy of HMP In order to get the mutual information of the input and output of a DMC, we need the entropy of the output, which is a HMP if the input is Markov. The entropy of HMP can be approximated by an upper bound and a lower bound. These bounds can be calculated recursively.
MSE of Our Estimators • The MSE of our entropy estimator for i.i.d. sources satisfies • The MSE of our mutual information estimator for i.i.d. sources • We have convergence results for divergence estimator and for Markov sources and stationary ergodic sources.
MSE of Entropy Estimator for HMP • We can prove H(Yd|Yd-1,…,Y1) converges to H(Y) exponentially fast w.r.t. d, if the Hidden Markov Process’ mapping satisfies that there exists an , such that for exactly one • We want to further establish the convergence rate of our entropy estimator for HMP.
The MI fusion approach is equivalent to learning a latent variable model of the audio video measurements. Random variables: Parameters, appearance bases: Simultaneously learn statistics of joint audio/video variables and parameters as the statistic of association (consistent with the theory) Association vs the Generative Model
Extension of multi-modal fusion to include nuisance parameters Audio is an indirect pointer to the object of interest. Combine motion model (nuisance parameters) with audio-video appearance model. Incorporating Motion Parameters
Incorporating Motion Parameters without motion model example frames average image with motion model
Following Zhao, Shin Reich (2002), Chu, Haussecker, Zhao (2002), Ertin, Fisher, Potter (2003) we’ve started extending IT approaches to sensor management. Specifically, consider the case where a subset of measurements over time has been incorporated into the belief state. When is it better to incorporate a measurement from the past versus a new measurement? How can we efficiently choose a set of measurements (avoid the greedy approach)? Information Theoretic Sensor Management
Summary • Applied association method to multi-modal data • New MI/K-L divergence estimators based on permutation approach • Mitigates dimensionality issues, avoids some of the combinatorics. • Extended approach to triangulated graphs. • New estimators for information measures (entropy, divergence, mutual information) based on BWT (block sorting). • Doesn’t require knowledge of distribution or parameters of the sources. • Efficient algorithm, good estimates, fast convergence. • Significantly outperforms other algorithms tested. • Investigating use in several applications including as component for correspondence and fusion algorithms.