560 likes | 575 Views
Sensitivity Analysis of Enumerated Trees of Increasing Boolean Expressions. Saket Anand, David Madigan, Richard Mammone, Fred Roberts. A. B. 0. C. C. 1. A. A. 1. 0. 0. 1. B. 1. 1. 0. Enumeration and Selection of Optimum Decision Tree.
E N D
Sensitivity Analysis of Enumerated Trees of Increasing Boolean Expressions Saket Anand, David Madigan, Richard Mammone, Fred Roberts
A B 0 C C 1 A A 1 0 0 1 B 1 1 0 Enumeration and Selection of Optimum Decision Tree • A set of decision trees is constructed for each complete and monotonic boolean function where inputs represent tests performed by each sensor • The cost of each tree is evaluated and the optimum tree selected. Y = f(A, B, C) where f is complete and monotonic
Enumeration and Selection of Optimum Decision Tree • The decision trees are constructed using 4 sensors • For three sensors, there are 114 monotonic and complete boolean expressions. These can be implemented using 11808 distinct trees. • The trees are evaluated and ranked using the cost function1. • The tree with the lowest cost is selected as the optimum decision tree. 1Stroud, P. D. and Saeger K J., “Enumeration of Increasing Boolean Expressions and Alternative Digraph Implementations for Diagnostic Applications”, Proceedings Vol. IV, Computer, Communication and Control Technologies
Cost Function used for evaluating the decision trees. CTot =CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative+ Cfixed where, CFalsePositive is the cost of false positive (Type I error) CFalseNegative is the cost of false negative (Type II error) PFalsePositive is the probability of a false positive occurring PFalseNegativeis the probability of a false negative occurring Cfixed is the fixed cost of utilization of the tree. The Error Probability of the entire tree is computed from the error probabilities of the individual sensors.
Ti P(Yi|X=1) P(Yi|X=0) Characteristics of a typical sensor Probability of Error for Individual Sensors • For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2 (P(Yi=0|X=1)) errors are modeled using Gaussian distributions. • State of nature X=0 represents absence of a bomb. • State of nature X=1 represents presence of a bomb. • Yi represents the outcome of sensor i. • It is characterized by: • Ki, discrimination coefficient • Ti, decision threshold • Σi, variance of the distributions Ki
1 PD Operating Point EER 0 PF 1 Ki Ti P(Yi|X=1) P(Yi|X=0) Receiver Operating Characteristic (ROC) Curve • The ROC curve is the plot of the Probability of correct detection (PD) vs. the Probability of false positive (PF). • The ROC curve is used to select an operating point, which provides the trade off between the PD and PF • Each sensor has a ROC curve and the combination of the sensors into a decision tree has a composite ROC curve. • The parameter which is varied to get different operating points on the ROC curve is the sensor Threshold and a combination of Thresholds for the decision tree. • Equal Error Rate (EER) is the operating point on the ROC curve where, PF=1 - PD
Stroud-Saeger Experiments • Stroud-Saeger ranked all trees formed from four given sensors A, B, C and D according to increasing tree costs. The cost function used was as shown in earlier slides. • Values used in their experiment: • CA = .25; KA = 4.37; ΣA = 1; • CB = .25; KB = 1.53; ΣB = 1; • CC = 10; KC = 2.9; ΣC = 1; • CD = 30; KD = 4.6; ΣD = 1; • where Ci is the individual cost of utilization of sensor i, Ki is the sensor discrimination power and Σi is the relative spread factor for sensor i. • Values of other variables are not known.
Cost Sensitivity to Global Parameters • Values used in the experiment: • CA = .25; P(YA=1|X=1) = .9856; P(YA=1|X=0) = .0144; • CB = 1; P(YB=1|X=1) = .7779; P(YB=1|X=0) = .2221; • CC = 10; P(YC=1|X=1) = .9265; P(YC=1|X=0) = .0735; • CD = 30; P(YC=1|X=1) = .9893; P(YC=1|X=0) = .0107; where Ci is the individual cost of utilization of sensor i. The probabilities have been computed for a threshold corresponding to the equal error rate. • CFalseNegative to be varied between 25 million and 500 billion dollars • Low and high estimates of direct and indirect costs incurred due to a false negative. • CFalsePositive to be varied between 180 and 720 dollars • Cost incurred due to false positive (4 men * (3 -6 hrs) * (15 – 30 $/hr) • P(X=1) to be varied between 3/109 and 1/100,000
a a a c b b b 1 1 0 1 c c c 1 1 0 0 1 1 1 0 0 Structure of trees which came first Rank with 3 sensors (A, C and D) Tree number 37 Boolean Expr: 00011111 Tree number 49 Boolean Expr: 01010111 Tree number 55 Boolean Expr: 01111111
Frequency of optimal trees with 3 sensors (A,C and D) when one parameter was varied • Randomly selected fixed parameter values
Variation of CTot vs. CFalseNegative • P(X=1) and CFalsePositive were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalseNegative in the specified range. • Randomly selected fixed parameter values
Variation of CTot vs. CFalsePositive • P(X=1) and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalsePositive in the specified range. • Randomly selected fixed parameter values
Variation of CTot vs. P(X=1) • CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of P(X=1) in the specified range. • Randomly selected fixed parameter values
Frequency of optimal trees with 3 sensors (A,C and D) when one parameter was varied • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot vs. CFalseNegative • P(X=1) and CFalsePositive were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalseNegative in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot vs. CFalsePositive • P(X=1) and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalsePositive in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot vs. P(X=1) • CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of P(X=1) in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot wrt CFalseNegative and CFalsePositive • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalseNegative and P(X=1) • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalsePositive and P(X=1) • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalseNegative and CFalsePositive • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalseNegative and P(X=1) • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalsePositive and P(X=1) • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
a a b 1 b 1 c 1 d c d 1 0 1 1 d 0 1 0 1 Tree Structure and corresponding Boolean Expressions Tree number 11785 Boolean Expr: 0111111111111111 Tree number 11605 Boolean Expr: 0101011111111111
a b 1 d d d d d c b 0 0 0 0 0 1 1 1 1 1 0 c 1 a 1 b c 0 Tree Structure and corresponding Boolean Expressions Tree number 9133 Boolean Expr: 0001010111111111 Tree number 8965 Boolean Expr: 0001010101111111
a b b 0 c 1 1 c d d d b 0 0 0 0 0 1 d 1 1 1 c 1 1 a c 0 Tree Structure and corresponding Boolean Expressions Tree number 6797 Boolean Expr: 0001000101111111 Tree number 2473 Boolean Expr: 0000000101111111
a d 0 1 d b 0 1 c 1 1 Tree Structure and corresponding Boolean Expressions Tree number 11305 Boolean Expr: 0101010101111111
Variation of CTot vs. CFalseNegative • P(X=1) and CFalsePositive were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalseNegative in the specified range. • Randomly selected fixed parameter values
Variation of CTot vs. CFalsePositive • P(X=1) and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalsePositive in the specified range. • Randomly selected fixed parameter values
Variation of CTot vs. P(X=1) • CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of P(X=1) in the specified range. • Randomly selected fixed parameter values
Variation of CTot vs. CFalseNegative • P(X=1) and CFalsePositive were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalseNegative in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot vs. CFalsePositive • P(X=1) and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of CFalsePositive in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Variation of CTot vs. P(X=1) • CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was • computed for 10,000 randomly selected values of P(X=1) in the specified range. • Fixed parameter values selected at Stroud and Saeger values
Frequency of optimal trees with 4 sensors when two parameters were varied. The fixed parameters were randomly selected.
Variation of CTot wrt CFalseNegative and CFalsePositive • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalseNegative and P(X=1) • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalsePositive and P(X=1) • Randomly selected fixed parameter values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Frequency of optimal trees with 4 sensors when two parameters were varied. The fixed parameters were selected at the Stroud and Saeger values.
Variation of CTot wrt CFalseNegative and CFalsePositive • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalseNegative and P(X=1) • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Variation of CTot wrt CFalsePositive and P(X=1) • Fixed parameter values selected at Stroud and Saeger values CTot =CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1)+ Cfixed
Sensitivity to Sensor Performance Following experiments have been done using sensors A, B, C and D as described below by varying the individual sensor thresholds TA, TB and TC from -4.0 to +4.0 in steps of 0.4. These values were chosen since they gave us a ROC curve for the individual sensors over a complete range P(Yi=1|X=0) and P(Yi=1|X=1) CA = .25; KA = 4.37; ΣA = 1 CB= .25; KB = 1.53; ΣB = 1 CC = 15; KC = 2.9;ΣC = 1 CD = 30; KD = 4.6;ΣD = 1 where Ci is the individual cost of utilization of sensor i, Ki is the discrimination power of the sensor and Σi is the spread factor for the sensor The probability of false positive for the ith sensor is computed as: P(Yi=1|X=0) = 0.5 erfc[Ti/√2] The probability of detection for the ith sensor is computed as: P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]
Frequency of optimal trees with 3 sensors when the Thresholds were varied. The fixed parameters ( CFalsePositive, CFalseNegative , P(X=1)) were selected randomly. Fifteen trees attained rank one, out of which tree number 37 was the most frequent.
Frequency of optimal trees with 4 sensors when the Thresholds were varied. The fixed parameters ( CFalsePositive, CFalseNegative , P(X=1)) were selected randomly. 244 trees attained rank one, out of which tree number 445 was the most frequent. Only 15 most frequently occurring optimal trees out of the 241 are tabulated below.