390 likes | 619 Views
Multivariable Distributions. ch4. Multivariable Distributions. It may be favorable to take more than one measurement on a random experiment. The data may then be collected in pairs of (x i , y i ).
E N D
Multivariable Distributions • It may be favorable to take more than one measurement on a random experiment. • The data may then be collected in pairs of (xi, yi). • Def.4.1-1: X & Y are two discrete R.V. defined over the support S. The probability that X=x, Y=y is denoted as f(x,y)=P(X=x,Y=y). f(x,y) is the joint probability mass function (joint p.m.f.) of X and Y: • 0≤f(x,y)≤1; ΣΣ(x,y)∈Sf(x,y)=1; P[(X,Y)∈A]=ΣΣ(x,y)∈Af(x,y), A ⊆S.
Illustration Example • Ex.4.1-3: Roll a pair of dice: X is the smaller and Y is the larger. • The outcome is (3, 2) or (2, 3) ⇒X=2 & Y=3 with 2/36 probability. • The outcome is (2, 2) ⇒X=2 & Y=2 with 1/36 probability. • Thus, the joint p.m.f. of X and Y is y x Marginal p.m.f.
Marginal Probability and Independence • Def.4.1-2: X and Y have the joint p.m.f f(x,y) with space S. • The marginal p.m.f. of X is f1(x)=Σyf(x,y)=P(X=x), x∈S1. • The marginal p.m.f. of Y is f2(y)=Σxf(x,y)=P(Y=y), y∈S2. • X and Y are independent iff P(X=x, Y=y)=P(X=x)P(Y=y), namely, f(x,y)=f1(x)f2(y), x∈S1, y∈S2. • Otherwise, X and Y are dependent. • X and Y in Ex5.1-3 are dependent: 1/36=f(1,1) ≠ f1(1)f2(1)=11/36*1/36. • Ex4.1-4: The joint p.m.f. f(x,y)=(x+y)/21, x=1,2,3, y=1,2. • Then, f1(x)=Σy=1~2(x+y)/21=(2x+3)/21, x=1,2,3. • Likewise, f2(1)=Σx=1~3(x+y)/21=(6+3y)/21, y=1,2. • Since f(x,y)≠f1(x)f2(y), X and Y are dependent. • Ex4.1-6: f(x,y)=xy2/13, (x,y)=(1,1),(1,2),(2,2).
Quick Dependence Checks • Practically, “dependence” can be quickly determined if • The support of X and Y is NOT rectangular, or • S is therefore not the product set {(x,y): x∈S1, y∈S2}, as in Ex4.1-6. • f(x,y) cannot be factored (separated) into the product of an x-alone expression and a pure y function. • In Ex4.1-4, f(x,y) is a sum, not a product, of x-alone and y-alone functions. • Ex4.1-7: [Probability Histogram for a joint p.m.f.]
Mathematical Expectation • If u(X1,X2) is a function of two R.V. X1& X2, thenif it exists, is called the mathematical expectation (or expected value) of u(X1,X2). • The mean of Xi, i=1,2: • The variance of Xi: • Ex4.1-8: A player selects a chip from a bowl having 8 chips: 3 marked (0,0), 2 (1,0), 2 (0,1), 1 (1,1).
Probability Density Function Joint • Joint Probability Density Function, joint p.d.f., of two continuous-type R.V. X & Y, is an integrable function f(x,y): • f(x,y)≥0; ∫y=-∞~∞∫x=-∞~∞f(x,y)dxdy=1; • P[(X,Y)∈A]=∫∫Af(x,y)dxdy, for an event A. • Ex4.1-9: X and Y have the joint p.d.f. • A={(x,y): 0<x<1, 0<y<x}. • The respective marginal p.d.f.s are X and Y are independent!
Independence of Continuous Type R.V.s • Two continuous type R.V. X and Y are independent iff the joint p.d.f. factors into the product of their marginal p.d.f.s. • Ex4.1-10: X and Y have the joint p.d.f. • The support S={(x,y): 0≤x≤y≤1}, bounded by x=0, y=1, x=y lines. • The marginal p.d.f.s are • Various expected values: X and Y are dependent!
Multivariate Hypergeometric Distribution • Ex4.1-11: Of 200 students, 40 have As, 60 Bs; 100 Cs, Ds, or Fs. • A sample of size 25 is taken at random without replacement. • X1 is the number of A students, X2 is the number of B students, and • 25 –X1–X2 is the number of the other students. • The space S = {(x1,x2): x1,x2≥0, x1+x2≤25}. • The marginal p.m.f. of X1can be also obtained as: From the knowledge of the model. X1and X2 are dependent!
Binomial ⇒ Trinomial Distribution • Trinomial Distribution: The experiment is repeated n times. • The probability p1: perfect, p2: second; p3: defective, p3=1-p1-p2. • X1: the number of perfect items, X2 for second, X3 for defective. • The joint p.m.f. is • X1 is b(n,p1), X2 is b(n,p2); both are dependent. • Ex4.1-13: In manufacturing a certain item, • 95% of the items are good; 4% are “seconds”, and 1% defective. • An inspector observes n=20 items selected at random, counting the number X of seconds, and the number Y of defectives. • The probability that at least 2 seconds or at least 2 defective items are found, namely A={(x,y): x≥2 or y≥2}, is
Correlation Coefficient • For two R.V. X1 & X2, • The mean of Xi, i=1,2: • The variance of Xi: • The covariance of X1 & X2 is • The correlation coefficient of X1& X2 is • Ex4.2-1: X1& X2 have the joint p.m.f. → Not a product ⇒Dependent!
Insights of the Meaning of ρ • Among all points in S, ρ tends to be positive if more points are simultaneously above orbelow their respective means with larger probability. • The least-squares regression line is a line passing given (μx,μy) with the best slope b s.t. K(b)=E{[(Y-μy)-b(X-μx)]2} is minimized. • The square of the vertical distance from a point to the line. • ρ= ±1: K(b)=0 ⇒all the points lie on the least-squares regression line. • ρ= 0: K(b)=σy2, the line is y=μy; X and Y could be independent!! • ρmeasures the amount of linearity in the probability distribution.
Example • Ex4.2-2: Roll a pair of 4-sided die: X is the number of ones, Y is the number of twos and threes. • The joint p.m.f. is • The line of best fit is
Independence ⇒ ρ=0 ∵independence • The converse is not necessarily true! • Ex4.2-3: The joint p.m.f. of X and Y is f(x,y)=1/3, (x,y)=(0,1), (1,0), (2,1). • Obviously, the support is not “rectangular”, so X and Y are dependent. • Empirical Data: from n bivariate observations: (xi,yi), i=1..n. • We can compute the sample mean and variance for each variate. • We can also compute the sample correlation coefficient and the sample least squares regression line. (Ref. p.241)
Conditional Distributions • Def.4.3-1: The conditional probability mass function of X, given that Y=y, is defined by g(x|y)=f(x,y)/f2(y), if f2(y)>0. • Likewise, h(y|x)=f(x,y)/f1(x), if f1(x)>0. • Ex.4.3-1: X and Y have the joint p.m.f f(x,y)=(x+y)/21, x=1,2,3; y=1,2. • f1(x)=(2x+3)/21, x=1,2,3; f2(y)=(3y+6)/21, y=1,2. • Thus, given Y=y, the conditional p.m.f. of X is • When y=1, g(x|1)=(x+1)/9, x=1,2,3; g(1|1):g(2|1):g(3|1)=2:3:4. • When y=2, g(x|2)=(x+2)/12, x=1,2,3; g(1|2):g(2|2):g(3|2)=3:4:5. • Similar relationships about h(y|x) can be obtained. Dependent!
Conditional Mean and Variance • The conditional mean of Y, given X=x, is • The conditional variance of Y, given X=x, is • Ex.4.3-2: [from Ex.4.3-1] X and Y have the joint p.m.f f(x,y)=(x+y)/21, x=1,2,3; y=1,2.
Relationship about Conditional Mean • The point (μX,μY) locates on the above two lines, and is their junction. • The product of the slopes is ρ2. • The ratio of the slopes is These relations can derive the unknown from the others known.
Example • Ex.4.3-3: X and Y have the trinomial p.m.f. with n, p1, p2, p3=1-p1-p2 • They have the marginal p.m.f. b(n, p1), b(n, p2), so
Example for Continuous-type R.V. • Ex4.3-5: [From Ex4.1-10] ⇒The conditional distribution of Y given X=x is U(x,1).[U(a,b) has mean (b+a)/2, and variance (b-a)2/12.]
Bivariate Normal Distribution • The joint p.d.f of X : N(μX,σX2)and Y : N(μY,σY2) is • Therefore, A linear function of x. A constant w.r.t. x.
Examples • Ex.5.6-1: • Ex.5.6-2
Bivariate Normal:ρ=0 ⇒ Independence • Thm5.6-1: For X and Y with a bivariate normal distribution with ρ, X and Y are independent iffρ=0. • So are trivariate and multivariate normal distributions. • When ρ=0,
Transformations of R.V.s Continuous type • In Section 3.5, the transformation of a single variable X with f(x) to another Y=v(X), an increasing or decreasing fn, can be done as: • Ex.4.4-1: X: b(n,p), Y=X2, if n=3, p=1/4, then • What is the transformation u(X/n) leading to a variance free of p? Taylor’s expansion about p: • Ex: X: b(100,1/4)or b(100,9/10). Discrete type When the variance is constant, or free of p,
Multivariate Transformations • When the function Y=u(X) does not have a single-valued inverse, it needs to consider possible inverse functions individually. • Each range will be delimited to match the right inverse. • For multivariate, the derivative is replaced by the Jacobian. • Continuous R.V. X1 and X2 have the joint p.d.f. f(x1, x2). • If has the single-valued inverse then the joint p.d.f. of Y1 and Y2 is • [Most difficult] The mapping of the supports are considered.
Transformation to the Independent • Ex4.4-2: X1 and X2 have the joint p.d.f. f(x1, x2)=2, 0<x1<x2<1. • Consider Y1=X1/X2, Y2=X2: • The mapping of the supports: • The marginal p.d.f.: • ∵g(y1,y2)=g1(y1)g2(y2) ∴Y1,Y2 Independent. →
Transformation to the Dependent • Ex4.4-3: X1 and X2 are indep., each with p.d.f. f(x)=e-x, 0<x<∞. • Their joint p.d.f. f(x1, x2)= e-x1e-x2, 0<x1<∞, 0<x2<∞. • Consider Y1=X1-X2, Y2=X1-X2: • The mapping of the supports: • The marginal p.d.f.: • ∵g(y1,y2) ≠g1(y1)g2(y2) ∴Y1,Y2 Dependent. → Double exponential p.d.f.
Beta Distribution • Ex4.4-4: X1 and X2 have indep. Gamma distributions withα,θ and β, θ. Their joint p.d.f. is • Consider Y1=X1/(X1+X2), Y2=X1+X2:i.e., X1=Y1Y2, X2=Y2-Y1Y2. • The marginal p.d.f.: • ∵g(y1,y2)=g1(y1)g2(y2) ∴Y1,Y2 Independent. Beta p.d.f. Gamma p.d.f.
Box-Muller Transformation • Ex5.3-4: X1 and X2 have indep. Uniform distributions U(0,1). • Consider • Two indep. U(0,1) ⇒ two indep. N(0,1)!!
Distribution Function Technique • Ex.5.3-5: Z is N(0,1), U is χ2(r), Z and U are independent. • The joint p.d.f. of Z and U is χ2(r+1)
Another Example • Ex.4.4-5: U: χ2(r1) and V: χ2(r2) are independent. • The joint p.d.f. of Z and U is • The knowledge of known distributionsand their associated integration relationshipsare useful to derivethe distributions of unknown distributions. χ2(r1+r2)
Order Statistics • The order statistics are the observations of the random sample arranged in magnitude from the smallest to the largest. • Assume there is no tie: identical observations. • Ex6.9-1: n=5 trials: {0.62, 0.98, 0.31, 0.81, 0.53} for the p.d.f. f(x)=2x, 0<x<1. The order statistics are {0.31, 0.53, 0.62, 0.81, 0.98}. • The sample median is 0.62, and the sample range is 0.98-0.31=0.67. • Ex6.9-2: Let Y1<Y2<Y3<Y4<Y5 be the order statistics for X1, X2, X3, X4, X5, each from the p.d.f. f(x)=2x, 0<x<1. • Consider P(Y4<1/2) ≡at least 4 of Xi’s must be less than 1/2: 4 successes.
General Cases • The event that the rth order statistic Yr is at most y, {Yr≤y}, can occur iff at least r of the n observations are no more than y. • The probability of “success” on each trial is F(y). • We must have at least r successes. Thus,
Alternative Approach • A heuristic approach to obtain gr(y): • Within a short interval Δy: • There are (r-1) items fall less than y, and (n-r) items above y+Δy. • The multinomial probability with n trials is approximated as. • Ex5.9-3: (from Ex6.9-2) Y1<Y2<Y3<Y4<Y5 are the order statistics for X1, X2, X3, X4, X5, each from the p.d.f. f(x)=2x, 0<x<1. On a single trial
More Examples • Ex: 4 indep. Trials(Y1 ~ Y4) from a distribution with f(x)=1, 0<x<1. • Find the p.d.f. of Y3. • Ex: 7 indep. trials(Y1 ~ Y7) from a distribution f(x)=3(1-x)2, 0<x<1. • Find the p.d.f. of the sample median, i.e. Y4, is less than • Method 1: find g4(y), then • Method 2: find then By Table II on p.647.
Order Statistics of Uniform Distributions • Thm3.5-2: if X has a distribution function F(X), which has U(0,1). {F(X1),F(X2),…,F(Xn)} ⇒Wi’s are the order statistics of n indep. observations from U(0,1). • The distribution function of U(0,1) is G(w)=w, 0<w<1. • The p.d.f. of the rth order statistic Wr=F(Yr) is ⇒Y’s partition the support of X into n+1 parts, and thus n+1 areas under f(x) and above the x-axis. • Each area equals 1/(n+1) on the average. p.d.f. Beta
Percentiles • The (100p)th sample percentile πp is defined s.t. the area under f(x) to the left of πp is p. • Therefore, Yr is the estimator of πp, where r=(n+1)p. • In case (n+1)p is not an integer, a (weighted) average of Yr and Yr+1 can be used, where r=floor[(n+1)p]. • The sample median is • Ex6.9-5: X is the weight of soap; n=12 observations of X is listed: • 1013, 1019, 1021, 1024, 1026, 1028, 1033, 1035, 1039, 1040, 1043, 1047. • ∵n=12, the sample median is • ∵(n+1)(0.25)=3.25, the 25th percentile or first quartile is • ∵(n+1)(0.75)=9.75, the 75th percentile or third quartile is • ∵(n+1)(0.6)=7.8, the 60th percentile
Another Example • Ex5.6-7: The order statistics of 13 indep. Trials(Y1<Y2< …< Y13) from a continuous type distribution with the 35th percentile π0.35. • Find P(Y3< π0.35< Y7) • The event {Y3< π0.35< Y7} happens iff there are at least 3 but less than 7 “successes”, where the success probability is p=0.35. Success By Table II on p.677~681.