Modeling correlations and dependencies among intervals

Modeling correlations and dependencies among intervals Scott Ferson and Vladik Kreinovich REC’06 Savannah, Georgia, 23 February 2006

Interval analysis Advantages • Natural for scientists and easy to explain • Works wherever uncertainty comes from • Works without specifying intervariable dependencies Disadvantages • Ranges can grow quickly become very wide • Cannot use information about dependence Badmouthing interval analysis?

Probability v. intervals • Probability theory • Can handle dependence well • Has an inadequate model of ignorance LYING: saying more than you really know • Interval analysis • Can handle epistemic uncertainty (ignorance) well • Has an inadequate model of dependence COWARDICE: saying less than you know I said this in Copenhagen, and nobody objected

My perspective • Elementary methods of interval analysis • Low-dimensional, usually static problems • Huge uncertainties • Verified computing • Important to be best possible • Naïve methods very easy to use • Intervals combined with probability theory • Need to be able to live with probabilists

1 1 1 v v v 1 1 1 u u u 0 0 0 (u,v) = uv W(u,v) = max(u+v1,0) M(u,v) = min(u,v) Dependence in probability theory copulas: 2-increasing [0,1]x[0,1]  [0,1] 45-degree lines at u=1 and v=1 u=0 v=0 2-increasing: Z(a2,b2) – Z(a1,b2) – Z(a2,b1) + Z(a1,b1)  0 Whenever a1  a2 and b1  b2 Perfect = comonotonic Each variable is almost surely a non-decreasing function of the other Opposite = countermonotonic Each variable is almost surely a non-increasing function of the other • Copulas fully capture arbitrary dependence between random variables (functional, shuffles, all) 2-increasing functions onto [0,1], with four edges fixed Opposite Perfect Independent

Dependence in the bivariate case • Any restriction on the possible pairings between inputs (any subset of the units square) May also require each value of u to match with at least v, and vice versa • A little simpler than a copula • The null restriction is the full unit square • Call this “nondependence” rather than independence • D denotes the set of all possible dependencies (set of all subsets of the unit square)

Two sides of a single coin • Mechanistic dependence Neumaier: “correlation” • Computational dependence Neumaier: “dependent” Francisco Cháves: decorrelation • Same representations used for both • Maybe the same origin phenomenologically • I’m mostly talking about mechanistic

Three special cases Opposite (countermonotonic) Nondependent (the Fréchet case) Perfect (comonotonic) 1 v 0 1 v 0 1 v 0 0 u 1 0 u 1 0 u 1

Correlation • A model of dependence that’s parameterized by a (scalar) value called the “correlation coefficient”  : [1, +1]  D • The correlation model is called “complete” if (1) = , (0) = , (+1) =

Corner-shaving dependence r = 1r = 0 r = +1 D(r) = { (u,v) : max(0, ur, u1+r) v min(1,u+1r, u+2+r)} u [0,1],v [0,1] f(A, B) = {c : c = f(u(a2–a1)+a1, v(b2–b1)+b1), (u,v)  D } A+B = [env(w(A, r)+b1, a1+w(B,r)), env(a2+w(B,1+r),w(A,1+r)+b2)] a1 if p < 0 w([a1,a2], p) = a2 if 1 < p p(a2a1)+a1 otherwise

Other complete correlation families r = 1r = 0 r = +1

Elliptic dependence

Elliptic dependence • Not complete (because r=0 isn’t nondependence) r = 1r = 0 r = +1

Parabolic dependence

Parabolic dependence • A variable and its square or square root have this dependence • Variables that are not related by squaring could also have this dependence relation e.g., A = [1,5], B = [1,10] r = 1r = 0 r = +1

So what difference does it make?

r = -0.7 A = [2,5] B = [3,9] a1= left(A) a2 = right(A) b1 = left(B) b2 = right(B) func w() if $2<0 then return left($1) else if 1<$2 then return right($1) else return $2* (right($1)-left($1))+left($1) // perfect [a1+b1, a2+b2] [ 5, 14] //opposite env(a1+b2, a2+b1) [ 8, 11] //corner-shaving [env(w(A, -r)+b1, a1+w(B,-r)), env(a2+w(B,1+r),w(A,1+r)+b2)] [ 7.1, 11.9] // elliptic d1 = (a2-a1)/2 d2 = (b2-b1)/2 d = sqrt(d1^2 + d2^2 + 2* r * d1* d2) (a1+a2+b1+b2)/2 + [-d,d] [ 7.2751, 11.725] // upper left [a1+b1, a2+b2] [ 5, 14] //lower left env(a2+b1, env(a1+b2, a1+b1)) [ 5, 11] // upper, right env(a2+b1, env(a1+b2, a2+b2)) [ 8, 14] // lower, right [a1+b1, a2+b2] [ 5, 14] // diamond [env(a1+w(B,0.5), w(A,0.5)+b1),_env(a2+w(B,0.5), w(A,0.5)+b2)] [ 6.5, 12.5] // nondependent [a1+b1, a2+b2] [ 5, 14] [ 5, 14] Perfect [ 8, 11] Opposite [ 7.1, 11.9] Corner-shaving (r = 0.7) [ 7.27, 11.73] Elliptic (r = 0.7) [ 5, 14] Upper, left [ 5, 11] Lower, left [ 8, 14] Upper, right [ 5, 14] Lower, right [ 6.5, 12.5] Diamond [ 5, 14] Nondependent A + B A = [2,5] B = [3,9] Tighter!

Eliciting dependence • As hard as getting intervals (maybe a bit worse) • Theoretical or “physics-based” arguments • Inference from empirical data • Risk of loss of rigor at this step (just as there is when we try to infer intervals from data)

Generalization to multiple dimensions • Pairwise • Matrix of two-dimensional dependence relations • Relatively easy to elicit • Multivariate • Subset of the unit hypercube • Potentially much better tightening • Computationally harder already NP-hard, so doesn’t spoil party

Computing • Sequence of binary operations • Need to deduce dependencies of intermediate results with each other and the original inputs • Different calculation order may give different results • Do all at once in one multivariate calculation • Can be much more difficult computationally • Can produce much better tightening

Living (in sin) with probabilists

Probability box (p-box) Interval bounds on an cumulative distribution function 1 Cumulative probability 0 0.0 1.0 2.0 3.0 X

1 1 0 0 10 20 30 40 10 20 30 Generalizes intervals and probability Probability distribution Probability box Interval 1 Cumulative probability 0 0 10 20 30 40 Not a uniform distribution

Probability bounds arithmetic 1 1 A B Cumulative Probability Cumulative Probability 0 0 0 1 2 3 4 5 6 0 2 4 6 8 10 12 14 What’s the sum of A+B?

A+B[4,12] prob=1/9 A+B[5,13] prob=1/9 A+B[7,13] prob=1/9 A+B[8,14] prob=1/9 A+B[9,15] prob=1/9 A+B[9,15] prob=1/9 A+B[10,16] prob=1/9 A+B[11,17] prob=1/9 Cartesian product A+B independence nondependent A[1,3] p1 = 1/3 A[2,4] p2 = 1/3 A[3,5] p3 = 1/3 B[2,8] q1 = 1/3 A+B[3,11] prob=1/9 B[6,10] q2 = 1/3 B[8,12] q3 = 1/3

A+B, independent/nondependent 1.00 0.75 0.50 Cumulative probability 0.25 0.00 15 0 3 6 9 12 18 A+B

Opposite/nondependent A+B opposite nondependent A[1,3] p1 = 1/3 A[2,4] p2 = 1/3 A[3,5] p3 = 1/3 B[2,8] q1 = 1/3 A+B[3,11] prob=0 A+B[4,12] prob=0 A+B[5,13] prob=1/3 B[6,10] q2 = 1/3 A+B[7,13] prob=0 A+B[8,14] prob=1/3 A+B[9,15] prob=0 B[8,12] q3 = 1/3 A+B[9,15] prob= 1/3 A+B[10,16] prob=0 A+B[11,17] prob=0

A+B, opposite / nondependent 1 Cumulative probability 0 0 3 6 9 12 15 18 A+B

Opposite / opposite A+B opposite opposite A[1,3] p1 = 1/3 A[2,4] p2 = 1/3 A[3,5] p3 = 1/3 B[2,8] q1 = 1/3 A+B[5,9] prob=0 A+B[6,10] prob=0 A+B[7,11] prob=1/3 B[6,10] q2 = 1/3 A+B[9,11] prob=0 A+B[10,12] prob=1/3 A+B[11,13] prob=0 B[8,12] q3 = 1/3 A+B[11,13] prob= 1/3 A+B[12,14] prob=0 A+B[13,15] prob=0

A+B, opposite / opposite A = [1,3]; a1= left(A); a2 = right(A) B = [2,8]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 5, 9] B = [6,10]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 9, 11] B = [8,12]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 11, 13] A = [2,4]; a1= left(A); a2 = right(A) B = [2,8]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 6, 10] B = [6,10]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 10, 12] B = [8,12]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 12, 14] A = [3,5]; a1= left(A); a2 = right(A) B = [2,8]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 7, 11] B = [6,10]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 11, 13] B = [8,12]; b1 = left(B); b2 = right(B) env(a1+b2, a2+b1) [ 13, 15] mix(1,[7,11], 1,[10,12], 1,[11,13]) ~(range=[7,13], mean=[9.33,12], var=[0,7]) 1.00 0.75 0.50 Cumulative probability 0.25 0.00 15 0 3 6 9 12 18 A+B

Three answers say different things 1.00 0.75 0.50 Cumulative probability 0.25 0.00 15 0 3 6 9 12 18 A+B

Conclusions • Interval analysis automatically accounts for all possible dependencies • Unlike probability theory, where the default assumption often underestimates uncertainty • Information about dependencies isn’t usually used to tighten results, but it can be • Variable repetition is just a special kind of dependence

End

Success Wishful thinking Prudent analysis Failure Good engineering Dumb luck Honorable failure Negligence

Independence • In the context of precise probabilities, there was a unique notion of independence • In the context of imprecise probabilities, however, this notion disintegrates into several distinct concepts • The different kinds of independence behave differently in computations

Several definitions of independence Equivalent definitions of independence • H(x,y) = F(x) G(y) , for all values x and y • P(XI, YJ) = P(XI) P(YJ), for any I, J R • h(x,y) = f(x) g(y) , for all values x and y • E(w(X) z(Y)) = E(w(X)) E(z(Y)), for arbitrary w, z • X,Y(t,s) = X(t) Y(s), for arbitrary t and s P(Xx) = F(x), P(Y y) = G(y) and P(Xx, Y y) = H(x, y); f, g and h are the density analogs of F,G and H; and  denotes the Fourier transform For precise probabilities, all these definitions are equivalent, so there’s a single concept

Imprecise probability independence • Random-set independence • Epistemic irrelevance (asymmetric) • Epistemic independence • Strong independence • Repetition independence • Others? Which should be called ‘independence’?

Notation • X and Y are random variables • FX and FY are their probability distributions • FX and FY aren’t known precisely, but we know they’re within classes MX and MY X ~ FXMX Y ~ FYMY

Repetition independence • X and Y are random variables • X and Y are independent (in the traditional sense) • X and Y are identically distributed according to F • F is unknown, but we know that FM • X and Y are repetition independent Analog of iid (independent and identically distributed) MX,Y = {H : H(x, y) = F(x) F(y), FM}

Strong independence • X ~ FXMX and Y ~ FYMY • X and Y are stochastically independent • All possible combinations of distributions from MX and MY are allowed • X and Y are strongly independent Complete absence of any relationship between X, Y MX,Y = {H : H(x, y) = FX(x) FY(y), FXMX, FYMY}

Epistemic independence • X ~ FXMX and Y ~ FYMY • E(f(X)|Y) = E(f(X)) and E(f(Y)|X) = E(f(Y)) for all functions f where E is the smallest mean over all possible probability distributions • X and Y are epistemically independent Lower bounds on expectations generalize the conditions P(X|Y) = P(X) and P(Y|X) = P(Y)

Random-set independence • Embodied in Cartesian products • X and Y with mass functions mX and mY are random-set independent if the Dempster-Shafer structure for their joint distribution has mass function m(A1A2) = mX (A1) mY (A2) whenever A1 is a focal element of X and A2 is a focal element of Y, and m(A) = 0 otherwise • Often easiest to compute

Random - set Epistemic Strong Repetition These cases of independence are nested. (Uncorrelated) (Nondependent) Random Random - - set set Epistemic Epistemic Strong Strong Repetition Repetition

1 1 X Y 0 0 1 0 +1 1 0 +1 X Y Interesting example • X = [1, +1], Y ={([1, 0], ½), ([0, 1], ½)} • If X and Y are “independent”, what is Z = XY ?

1 XY 0 1 0 +1 XY Compute via Yager’s convolution Y ([1, 0], ½) ([0, 1], ½) X ([1, +1], 1) ([1, +1], ½) ([1, +1], ½) The Cartesian product with one row and two columns produces this p-box

1 XY 0 1 0 +1 XY But consider the means • Clearly, EX = [1,+1] and EY=[½, +½]. • Therefore, E(XY) = [½, +½]. • But if this is the mean of the product, and its range is [1,+1], then we know better bounds on the CDF.

1 XY 0 1 0 +1 XY And consider the quantity signs • What’s the probability PZ that Z < 0? • Z < 0 only if X < 0 or Y < 0 (but not both) • PZ = PX(1PY) + PY(1PX), where PX = P(X < 0), PY = P(Y < 0) • But PY is ½ by construction • So PZ = ½PX + ½(1PX) = ½ • Thus, zero is the median of Z • Knowing median and range improves bounds

1 XY 0 1 0 +1 XY Best possible • These bounds are realized by solutions If X = 0, then Z=0 If X = Y = B = {(1, ½),(+1, ½)}, then Z = B • So these bounds are also best possible 1 1 B Z=0 0 0 1 0 +1 1 0 +1

1 1 1 XY XY XY 0 0 0 1 0 +1 1 0 +1 1 0 +1 XY XY XY So which is correct? Random-set independence Moment independence Strong independence The answer depends on what one meant by “independent”.

So what? • The example illustrates a practical difference between random-set independence and strong independence • It disproves the conjecture that the convolution of uncertain numbers is not affected by dependence assumptions if at least one of them is an interval • It tempers the claim about the best-possible nature of convolutions with probability boxes and Dempster-Shafer structures

Modeling correlations and dependencies among intervals

Modeling correlations and dependencies among intervals

Presentation Transcript

Reserve Ranges, Confidence Intervals and Prediction Intervals

DEPENDENCIES AND ADDICTIONS

Intervals

Convolutions and Correlations

DEPENDENCIES AND ADDICTIONS

Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational Dependencies

Correlations among Pain Severity, Functional Impairment and Clinical Symptoms in Fibromyalgia

Dependencies

Correlations and Copulas

Intervals

Modeling Dependencies in Protein-DNA Binding Sites

Correlations and Causations

Intervals

Correlations Among Measures of Dairy Cattle Fertility and Longevity

Meta-Analysis of Wetland Values: Modeling Spatial Dependencies

Scatterplots and Correlations

Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational Dependencies

Maximum Entropy Language Modeling with Syntactic, Semantic and Collocational Dependencies

Maximum Entropy Language Modeling with Semantic, Syntactic and Collocational Dependencies