590 likes | 763 Views
Why Data Fusion in Sensor Networks needs a new Champion ?. Kalyan Veeramachaneni Evo-Design Group CSAIL, Yumm Eye Tee Work done at D evelopment and R esearch in E volutionary A lgorithms for M ultisensor S mart Net works ( DreamsNet ) Syracuse University.
E N D
Why Data Fusion in Sensor Networks needs a newChampion? Kalyan Veeramachaneni Evo-Design Group CSAIL, Yumm Eye Tee Work done at Development and Research in Evolutionary Algorithms for MultisensorSmart Networks (DreamsNet) Syracuse University Evo-Design Group, CSAIL, MIT, September 3, 2009
Acknowledgements • Lisa Osadciw, Syracuse University • Kai Goebel, NASA Ames Research • Arun Ross, West Virginia University • Weizhong Yan, GE Global Research Center • VishwanathAvasarala, GE Global Research Center • NishaSrinivas, Syracuse University
Sensor Network Projects • Biometric Security System • Wind Turbine Diagnostics and Prognostics • First Responders Sensor Network • Pipeline Crack Detection System • Airport Ground Surveillance System
What are we detecting? • Modern day society relies on detection or determining the meaning of the presence or absence of a signal • Digital Communications • Pipeline/Bridges crack detection • Genuine User detection using biometrics • Presence of aircraft, ships, or motor vehicles • Locating emergency personnel • Weather Phenomena • Building Security • Sensors are located in remote areas making decisions using a variety of criteria • Maximum A-Posteriori Criterion • Maximum Likelihood Criterion • Minimum Error Criterion
Systems Level View (1) Signal Processing Hardware drives the design Ideally we would want a simple threshold on the incoming data
Systems Level View (2) Machine Learning We do not have control on collection of data Data drives the entire design
Applications • Signal Processing • Digital communications • Wireless communications • Radars • Surveillance systems • Locationing and GPS • Machine Learning • Online diagnostic tools (aircrafts, turbines etc. ) • Medical Diagnostics ( Cancer, Neurological disorders, seizures etc.) • Fraud detection on online systems • Inferencing in Sensor Networks • A mix of both problems • Seamless interaction of hardware and software • Applications are a mix as well • Seamless interaction of system entities as well • Biometrics is a classic example !!
Likelihood Ratio Test (1) • Traditional “digital communications” example • Decide either a bit ‘0’ or ‘1’ has been sent • Additive white Gaussian noise (AWGN) • Likelihood Ratio Test (maximizes posterior probability) • Optimal for Bayesian Cost Function Noise only Notice this is a linear cost function Signal +noise
Likelihood Ratio Test (2) • The ratio for the digital communications gives us a neat threshold detector • As long as standard deviation under both the Hypothesis is the same • It makes the LRT linear and very simple to implement • After taking logarithm on both sides and solving
Likelihood Ratio Test (3) • What happens when it is not digital communications ? • For example, unknown signal buried in noise • Standard Deviation under both the hypotheses is different • LRT becomes quadratic and requires two thresholds • Note: Still Gaussian under both the Hypothesis • Solving for the roots of the quadratic we get decision regions as: • Decide H0 : • Decide H1 :
Likelihood Ratio Test (4) • What about when under H0 is Gaussian and under H1 it is Exponential, will the ratio still result in a simple detector? • Multimodal distributions with multiple peaks? Summary: When you have a linear cost function, linear system operations (additive noise) you will have neat linear operations in the detector. Can we design detectors for more complicated models? What happens when we have multiple detectors helping us make a decision?
Data Fusion (1) • Binary hypothesis testing problem • H0 : Indicates an absence • H1: Indicates the presence of the phenomena • Decisions rendered by multiple classifiers (matchers) are fused to generate a global decision • In bandwidth constrained remote processing, decisions are made locally by the classifier before sending them to the central node
Data Fusion (2) • Let xibe the match score generated by the ith classifier • Each classifier applies its own threshold, , to determine if xi is a genuine or an impostor score • The variable ui records the decision made by the local classifier. • Let [u] = (u1, u2, …un) be the set of decisions rendered by multiple classifiers • The variable ufdenotes the global decision as a consequence of fusing local decisions (uf is 0, or uf is 1)
Sensor1 u1 X1 Fusion Rule Sensor2 X2 u2 AND OR Second Classifier Only First Sensor Only Bandwidth Constrained Detection Networks Noise only Event Likelihood density model for a sensor
Errors to be minimized • Goal : Two errors need to be minimized. • Bayesian risk function is minimized
Independent Decisions The errors can be estimated using
Correlated Decisions • Estimation of 2n-1 joint probabilities for n classifiers • Numerical integration is done to estimate the joint probability integrals • Bahadur-lazarfeld expansion reduces computational burden Normalized Decisions Correlation between normalized decisions
Effect of fusion rule design Effect of threshold design What is the Problem? • Joint optimization of thresholds and fusion rule (decision level) • The objective function is the Bayesian risk function: • We incorporate the thresholds as the search variables, the search is a NP Complete problem1 1 John N Tsitsiklis, Michael Athans, “On Complexity of Decentralized Decision making and detection problems” 23rd IEEE Conference on Decision and Control, 1984
Event is declared only in this quadrant, i.e. AND rule False Alarms: detecting an event that did not occur Threshold on Sensor 2 Noise Misses: Fail to detect an event *Event Threshold on Sensor 1 Bandwidth Constrained Detection Networks • Two types of Errors need to be reduced • If the entire observation value is transmitted to a central processing node, an efficient machine learning technique can be designed to achieve better accuracy • Shown below are 20000 samples of observations, 10000 belong to events, 10000 to noise. • 9 to 32 bits required per sample if all bits are transmitted • Reduces to 1 bit decision if decision is transmitted instead
What has been happening in this area? • Amount of Research and Publications on Topic Indicates Complexity • Quick Check Research Publications • 120 Journal Articles with Approximately 45 Discussing Similar Design Issues • 48 Textbooks At Least Currently On Sale In This Area • 5 Dissertations deal with same problem and provide human developed designs • Paper Published that Addresses the Difficulty • John N Tsitsiklis, Michael Athans, “On Complexity of Decentralized Decision making and detection problems” 23rd IEEE Conference on Decision and Control, 1984 • Optimizing Distributed Detection for 2 Sensors • Independent sensors: Intractable • Correlated sensors: NP Complete - • Researchers are reluctant to use EAs • A simple architectural or a parameter change can give you literally 10 pages worth of equations, fancy !! • Failure modes of gradient descent and other approaches are not identified
Likelihood Ratio Test Based Design • Decouple the two problems: optimize thresholds and fusion rule separately • Identify optimal individual threshold that minimizes the Bayesian Error • Optimal fusion rule for independent decisions • Optimal fusion rule for correlated decisions
Gradient Descent Approach • Use gradient information to simultaneously optimize fusion rule and thresholds where • Threshold for a sensor is the solution of the likelihood ratio test given by where
Particle Swarm Optimization Each particle is a solution Particles are randomly initialized in the search space Particle are moved in the search space using Demonstration on a test problem
PSO Based Design PSO parameters Random Initialization of Particles i<n Output the best solution CFA Cost Evaluation Training Data Velocity and Position Updates Update Particles Memory Save the best solution so far Convergence
PSO : Binary Search Spaces • Using a sigmoid transformation on the velocity, the probability of a binary variable can be determined ( Kennedy et al.) • Position update is changed to • Velocity update equation is not changed and the learning behavior of swarm is preserved
PSO : Binary Search Spaces • Transition is now probabilistic • Particles try to position themselves in the velocity space such that they have maximum probability of having a value ‘1’, in case they have evidence from multiple neighbors/iterations about the goodness of being at value ‘1’ for a variable
PSO : Discrete Search Spaces • Many problems in real world optimization are binary, discrete • For example, in sensor management, sensor selection, i.e., the sensor number is discrete variable • Increased complexity due to binary transformation of a discrete variable • The Hamming distance between two discrete values undergoes a non-linear transformation when an equivalent binary representation is used instead • The range of the discrete variable often does not match the upper limit of the equivalent binary representation • For example, a discrete variable of range [0,1,2,3,4,5] requires a three bit binary representation, which ranges between [0-7]
if Boundary Conditions, due to infinite support of the normal distribution PSO : Discrete Search Spaces • Modify the Sigmoid Transformation, for a M-ary system • The sigmoid gives the parameters of the distribution from which the discrete value is generated, i.e., • Particles try to position themselves in the velocity space such that the probability of one or the other discrete variable is high Using normal distribution here, Other distributions can be used if
if Boundary Conditions, due to infinite support of the normal distribution PSO : Discrete Search Spaces
Human Design Solution: Person-by-Person Optimal (PBPO) for Independent Sensors Human Design Solution: Likelihood Ratio Test (LRT) Design Human Competitive Result: Particle Swarm Optimization (PSO) Based Design Sensor1 u1 X1 Fusion Rule Sensor2 X2 u2 Optimize thresholds individually by keeping other thresholds and fusion rule constant Use LRT for independent or correlated deriving fusion rule Joint optimization of thresholds and Fusion Rule No closed form solution exists
Sensor Suites : Homogeneous Network • All sensors are identical in performance
Sensor Suites: Heterogeneous Network Type 1 • Different sensors have different separation of means between the two hypothesis
Sensor Suite : Heterogeneous Network Type 2 • Different standard deviations under both hypothesis and different separation of means,solution to LRT is quadratic
Results- Independent Observations, Homogeneous Network Probability of Error Achieved for Different Algorithms Averaged over 100 Trials
Results- Independent Observations, Homogeneous Network • Counting the evaluations to measure “time”
Results : Independent Observations, Heterogeneous Type 1 Probability of Error Achieved for Different Algorithms Averaged over 100 Trials
Preliminary Results : Independent Observations, Heterogeneous Type 2 Probability of Error Achieved for Different Algorithms Averaged over 100 Trials
Result: Independent Sensors Human Design Accuracy PSO Resulting Accuracy PBPO-Person-By-Person Optimal PSO – Particle Swarm Optimization
HumanDesign 2.5% 13% 54% Result: Correlated Sensors
Data Driven Design no yes
Region where an event is declared Region where an event is declared LRT (Human) Based Design: 2 thresholds on each sensor 2 Sensor only fusion rule PSO Based Design: Simple 1 Threshold for each sensor AND fusion rule Very few errors Correlated Sensors: Designs for 0.1 Correlation For one specific cost structure
Correlated Sensors: Designs for 0.9 Correlation Region where an event is declared Region where an event is declared LRT (Human) Based Design: 2 thresholds on each sensor 2 Sensor only fusion rule PSO Based Design: Simple 1 Threshold for each sensor AND fusion rule Higher number of errors, but still better
Comparison of Data Driven PSO Design with Other Approaches and Single Sensor Performance Varying the costs in the Bayesian Risk function and generating the designs gives the entire Receiver operating characteristic curve
Discrete Version of the Problem • Vendors only allow you to have access to multiple points on the ROC • The problem then becomes a combinatorial optimization problem • Design problem is then: • Operating point for each sensor • Fusion rule ( can still be solved by LRT) • Suppose we have three classifiers and each classifier can operate on any of the ‘N’ operating points, there are 3N choices for this problem • Discrete version of PSO or GA is used to identify the operating point sets. • No alternative approaches exist
Multi-Objective Design • Allows system designer to make trade-offs • Makes the fused system ROC available to the system designer • Adding a sensor, how much does it help? • Since fused system ROC is available, area under the curve gives a metric to evaluate the system • Allows system designers to make choices when acquiring sensors from multiple vendors • If I have to use sensors incrementally, which ones should I focus on ? • If I want to add sensors to my detection system, which sensors should I add to improve performance
Multi-Objective Design Results for Sensor Suites with 4,5 Sensors • Algorithm design for generating non-dominated solutions (close to Pareto set) • Non-Dominated Sorting PSO instead of a cost function • Continuous PSO for thresholds • Binary PSO for fusion rule, cannot use LRT for fusion rule