320 likes | 406 Views
Untangling equations involving uncertainty. Scott Ferson , Applied Biomathematics Vladik Kreinovich, University of Texas at El Paso W. Troy Tucker, Applied Biomathematics. Overview. Three kinds of operations Deconvolutions Backcalculations Updates (oh, my!)
E N D
Untangling equations involving uncertainty Scott Ferson,Applied Biomathematics Vladik Kreinovich, University of Texas at El Paso W. Troy Tucker, Applied Biomathematics
Overview • Three kinds of operations • Deconvolutions • Backcalculations • Updates(oh, my!) • Very elementary methods of interval analysis • Low-dimensional • Simple arithmetic operations • But combined with probability theory
1 1 1 Cumulative probability 0.5 0.5 0.5 0 0 0 0 10 20 30 40 10 20 30 40 10 20 30 Probability box (p-box) • Bounds on a cumulative distribution function (CDF) • Envelope of a Dempster-Shafer structure • Used in risk analysis and uncertainty arithmetic • Generalizes probability distributions and intervals This is an interval, not a uniform distribution
a =T( 0 , 10 , 20) + [0, 5] b =N([20,23],[1,12]) Disagreement between theoretical and observed variance Disagreement between theoretical and observed variance Disagreement between theoretical and observed variance c = a |+| b c = a + b 1 1 assuming independence assuming independence 1 1 0 0 0 0 80 80 CDF 1 0 0 0 20 0 40 assuming nothing 0 0 80 Probability bounds analysis (PBA)
PBA handles common problems • Imprecisely specified distributions • Poorly known or unknown dependencies • Non-negligible measurement error • Inconsistency in the quality of input data • Model uncertainty and non-stationarity • Plus, it’s much faster than Monte Carlo
Updating • Using knowledge of how variables are related to tighten their estimates • Removes internal inconsistency and explicates unrecognized knowledge • Also called constraint updating or editing • Also called natural extension
Example • Suppose W = [23, 33] H = [112, 150] A = [2000, 3200] • Does knowing WH=A let us to say any more?
Answer • Yes, we can infer that W = [23, 28.57] H = [112, 139.13] A = [2576, 3200] • The formulas are just W = intersect(W, A/H), etc. To get the largest possible W, for instance, let A be as large as possible and H as small as possible, and solve for W =A/H.
Bayesian strategy Prior Likelihood Posterior
Bayes’ rule • Concentrates mass onto the manifold of feasible combinations of W, H, and A • Answers have the same supports as intervals • Computationally complex • Needs specification of priors • Yields distributions that are not justified (come from the choice of priors) • Expresses less uncertainty than is present
Updating with p-boxes 1 1 1 A H W 0 0 0 20 30 40 120 140 160 2000 3000 4000
1 1 1 A H W 0 0 0 20 30 40 120 140 160 2000 3000 4000 intersect(W, A/H) intersect(H, A/W) intersect(A, WH) Answers
Calculation with p-boxes • Agrees with interval analysis whenever inputs are intervals • Relaxes Bayesian strategy when precise priors are not warranted • Produces more reasonable answers when priors not well known • Much easier to compute than Bayes’ rule
Backcalculation • Find constraints on B that ensure C=A+B satisfies specified constraints • Or, more generally, C = f(A1, A2,…, Ak, B) • If A and C are intervals, the answer is called the tolerance solution
conc intake body mass dose = dose body mass intake conc = Can’t just invert the equation When conc is put back into the forward equation, the dose is wider than planned
Example dose = [0, 2] milligram per kilogram intake = [1, 2.5] liter mass = [60, 96] kilogram conc = dose * mass / intake [ 0, 192] milligram liter-1 dose = conc * intake / mass [ 0, 8] milligram kilogram-1 Doses 4 times larger than tolerable levels!
Backcalculating probability distributions • Needed for engineering design problems, e.g., cleanup and remediation planning for environmental contamination • Available analytical algorithms are unstable for almost all problems • Except in a few special cases, Monte Carlo simulation cannot compute backcalculations; trial and error methods are required
1 1 A C 0 0 -10 0 10 20 30 40 50 60 2 3 4 5 6 7 8 Backcalculation with p-boxes Suppose A + B = C, where A = normal(5, 1) C = {0 C, median 15, 90th %ile 35, max 50}
1 B 0 -10 0 10 20 30 40 50 Getting the answer • The backcalculation algorithm basically reverses the forward convolution • Not hard at all…but a little messy to show • Any distribution totally inside B is sure to satisfy the constraint … it’s “kernel”
1 C* C 0 -10 0 10 20 30 40 50 60 Check by plugging back in A + B = C* C
When you Know that A + B = C A – B = C A B = C A / B = C A ^ B = C 2A = C A² = C And you have estimates for A, B A, C B ,C A, B A, C B ,C A, B A, C B ,C A, B A, C B ,C A, B A, C B ,C A C A C Use this formula to find the unknown C = A + B B = backcalc(A,C) A = backcalc (B,C) C = A – B B = –backcalc(A,C) A = backcalc (–B,C) C = A * B B = factor(A,C) A = factor(B,C) C = A / B B = 1/factor(A,C) A = factor(1/B,C) C = A ^ B B = factor(log A, log C) A = exp(factor(B, log C)) C = 2 * A A = C / 2 C = A ^ 2 A = sqrt(C)
Kernels • Existence more likely if p-boxes are fat • Wider if we can also assume independence • Answers are not unique, even though tolerance solutions always are • Different kernels can emphasize different properties • Envelope of all possible kernels is the shell (i.e., the united solution)
Precise distributions • Precise distributions can’t express the nature of the target • Finding a conc distribution that results in a prescribed distribution of doses says we want some doses to be high (any distribution to the left would be even better) • We need to express the dose target as a p-box
Deconvolution • Uses information about dependence to tighten estimates • Useful, for instance, in correcting an estimated distribution for measurement uncertainty • For instance, suppose Y = X + • If X and are independent, Y² = X² + ² • Then we do an uncertainty correction
Example • Y = X + • Y, ~ normal • X ~ N(decon(Y, X), sqrt(decon(², Y²)) • Y ~ N([5,9], [2,3]); ~ N([1,+1], [½,1]) • X ~ N(dcn([1,1],[5,6]), sqrt(dcn([¼,1],[4,9]))) • X ~ N([6,8], sqrt([3, 63])
Deconvolutions with p-boxes • As for backcalculations, computation of deconvolutions is troublesome in probability theory, but often much simpler with p-boxes • Deconvolution didn’t have an analog in interval analysis (until now via p-boxes)
Relaxing over-determination • Most constraint problems almost never have solutions with probability distributions • The constraints are too numerous and strict • P-boxes relax these constraints so that many problems can have solutions
P-boxes in interval analysis • P-boxes bring probability distributions into the realm of intervals • Express and solve backcalculation problems better than is possible in probability theory by itself • Generalize the notion of tolerance solutions (kernels) • Relax unwarranted assumptions about priors in updating problems needed in a Bayesian approach • Introduce deconvolution into interval analysis
Acknowledgments • Janos Hajagos, Stony Brook University • Lev Ginzburg, Stony Brook University • David Myers, Applied Biomathematics • National Institutes of Health SBIR program
1 1 1 W 1 1 A 1 H W H A 0 20 30 40 0 0 2500 2700 2900 3100 110 120 130 140 0 0 110 120 130 140 150 160 0 22 23 24 25 26 27 28 29 2000 3000 4000