2.1 Probability Joint & Conditional Probabilities Statistical Independence Random Variables & Distributions

2.1 Probability • Joint & Conditional Probabilities • Statistical Independence • Random Variables & Distributions

Consider an Experiment with several Possible Outcomes, si • Sample SpaceS = {s1, s2, s3 ... sn } ≡ Set of All Possible Outcomes • possibly infinite set • An EventA ≡ A S consists of some possible outcomes (samples) • e.g. A = { s2, s3 } • Event Ā ≡compliment of A • A Ā= • AĀ = S • Mutually Exclusive Events≡ A  B= 2.1 Probability

Statistical probability of A occurring = P(A) • (1) P(A)  0 • (2) P(S) = 1 • (3) let Ai , i = 1,2,... be a (possibly infinite) number of events • assume for i  j Ai Aj= • then P(A1) P(A2)  ... P(An) = ΣiP(Ai)

S = Let Aiand Bjbe possible outcomes from 2 experiments joint event≡ events Aiand Bj both occur joint outcome≡ (AiBj) joint probability ≡ P(Ai Bj) such that 0 P(Ai Bj)  1 Joint Events and Joint Probabilities • joint events consider multiple experiments • consider two experiments with outcomes A, B • sample space S of all outcomes consists of all 2-tuples (i,j)

Let all Bjoutcomes, for j = 1 to m be mutually exclusive Let all Aioutcomes, for i = 1 to n be mutually exclusive 2.1-1 2.1-2 P(Ai)= P(Bj)= Let all outcomes from both experiments be mutually exclusive = 1 2.1-3 Mutually Exclusive Joint Events Case of 2 experiments generalizes to case of kexperiments

e.g. experiment = die toss and coin toss A1 = probability of rolling a 1 on a die toss A2 = probability of rolling a 2 or 3 on a die toss A3 = probability of a rolling a 4, 5, or 6 on a die toss B1 = probability of a tail on a coin toss P(B1) = P(A1,B1) + P(A2,B1) + + P(A3,B1) since coin tosses are independent we have P(B1) = P(A1)· P(B1) + P(A2)· P(B1) + P(A3)· P(B1)

P(A|B) = 2.1-4 2.1-5 P(A,B) = P(A|B) P(B) = P(B|A) P(A) Conditional Probabilities – Joint Events • consider two events A and B in sample space S • let P(A,B) = probability of both A and B (A  B) occurring •  P(A|B) = probability of A occurring given B has occurred • Relationships also apply to combined experiment • P(A,B) = probability of a joint event occurring • P(A|B) = probability of A occurring given B has occurred

(1) if A  B=  P(A|B) = 0 2.1-6 • event B occurred  event A could not have occurred • e.g. is a die toss = 6 it cannot = 1 if A B  A  B = A (2) P(A|B) = = 2.1-7 • event B occurred  event A could have possibly occurred • e.g. is a die toss was = to 2 or 4 then it may have been = 2 if B A  A  B = B (3) P(A|B) = = = 1 2.1-8 • event B occurred  event A must have occurred • e.g. if a 2 and 4 were tossed then a 2 was tossed

P(Ai|B) = = = • Bayes Theorem • assume n mutually exclusive events, Ai, i = 1,2... n such that • assume B is an arbitrary event with P(B) > 0 2.1-9

P(Ai|B) = • Bayes Theorem • used to derive optimal receiver structure for digital communications • Ai, i = 1,2... n represents n possible transmitted messages, • P(Ai) = fixed probability that Ai is sent • B represents received Ai corrupted by noise P(Ai|B) = probability that Aiwas sent based on received B

1 1 6 1 6 6 6 1 2 7 7 2 2 7 7 2 8 3 8 3 8 3 3 8 9 4 9 9 4 4 9 4 0 5 5 5 0 0 5 0 e.g. Conditional Probabilities & Bayes Theorm Assume a deck of 40 cards with 4 suites – each with 10 unique values S = {A1, A2,…, A0} Ai= probability of picking a specific blue suited card  P(Ai) = 1/40 Bi = probability of picking any blue suited card  P(Bi) = 1/4 • Ai B  • P(Ai|B) =P(Ai) / P(B) = 1/10 • P(B|Ai) = P(Ai) / P(Ai) = 1 P(Ai|B) = P(B|Ai) / [P(B|A1) +…+ P(B|A10)] = 1/10

P(A|B) = P(A) 2.1-10 then by 2.1-5 the joint probability of events A & B both occurring is given by: P(A,B) = P(A)P(B) 2.1-11 more generally P(A1, A2 ,...,An) = P(A1) P(A2)... P(An) 2.1-12 Statistical Independence • consider • multiple experiments or repeated trials of a single experiment • events A & B are statistically independent • occurrence of A is independent of B

2.1.1 Random Variables, Probability Distributions, Probability Densities • Assume an experiment with • sample spaceS representing all possible outcomes • elements S • then the random variable (RV) X(s) is a real function with • domainS • range = set of real numbers • experimental outcomes can be • discrete (digital)  discrete RV • continuous (analog)  continuous RV

(1) F(x) = cumulative distribution function (CDF) of X • consider the event {X  x} • let P(X  x) probability of event occurring F(x)= P(X  x) for - < x <  2.1-13 (2) p(x) = probability density function (PDF) of X • consider event {X = x} • let P(X = x) probability of event occurring p(x)= P(X = x) for - < x <  2.1-14 CDF and PDF Assume X is a random variable & x = any real number

p(x) = for - < x <  2.1-16 p(x) F(x1) p(x1) x F(x) F(x)= 2.1-15 1 0 F(x1) x • properties of F(x) • 0  F(x)  1 • F(-) = 0 • F() = 1

(i) for discrete or mixed (continuous & discrete) RV • PDF contains impulses at discontinuities points of F(x) • xi represents discrete values of RV • discrete part of CDF can be written as F(n)= 2.1-17

P(x1< X  x2) =F(x2) - F(x1) 2.1-18 = • (ii) For a specific range (x1,x2)  find P(x1 < X  x2) • probability RV X falls within (x1,x2) • consider an event as union of 2 mutually exclusive events • (1) (X  x1) • (2) (x1< X  x2) • then • P(X  x2) = P(x1< X  x2) + P(x1X) • P(x1< X  x2) = P(X  x2) - P(x1X)

= F(x1, x2) = P(X1 x1, X2 x2) 2.1-19 p(x2 ,x1) = F(x2 ,x1) 2.1-20 • 2.1.1.1Multiple RVs with Joint CDFs and Joint PDFs • are the result of either • combined experiments or repeated trials on a single experiment • consideration of 2 random variables X1& X2

2.1-21 2.1-22 and F(,)= = 1 2.1-23 p(x2)= p(x1)= For the joint PDFp(x1, x2)theMarginal PDF of p(x1) or p(x2) is given by:

(1) jointPDF given by p(x1 ,...,xn) = P(X1= x1,..., Xn= xn) 2.1-24 F(x1,..,xn) p(x1 ,...,xn) = (2) jointCDF given by F(x1,..., xn) = P(X1 x1,..., Xn xn) F(x1,..., xn) = 2.1-25 and • F(x1 ,x2, 3, x4,..., xn) = F(x1 ,x2, x4,,..., xn) • F(x1 ,x2, -3, x4,..., xn) = 0 MultidimensionalRVs & Joint Distributions • assume there are RVs which are given by Xi, i = 1,2,...n

P(X1 x1 | X2  x2) = = 2.1.1.2Conditional Probability Distribution Functions Consider RVs X1 & X2 with joint PDF p(x1,x2) (i) Determine the probability of event(X1 x1| X2  x2) • probability of X1 x1 given that X2  x2

then P(X1  x1 | x2-Δx2 < X2 x2) = 2.1-26 = (ii) Determine probability of event(X1 x1| x2- Δx2 < X2  x2) • probability of X1 x1conditioned onx2 -Δx2 < X2  x2 • Δx2 = some positive increment • from eqns (2.1-4) and (2.1-18)

the Conditional CDF of X1 given X2is by definition P(X1 x1 | X2 = x2) ≡F(x1 | x2) = = then F(x1 | x2) = 2.1-27 (iii) Conditional CDF of X1 given X2 • assume the PDFs p(x1,x2) & p(x2) are continuous over (x2-Δx2, x2) • divide 2.1-26 by Δx2 & take limit as Δx2 0 • and • F(-| x2) = 0 • F(| x2) = 1

p(x1, x2)= p(x1 | x2) p(x2) = p(x2 | x1) p(x1) p(x1 | x2) = differentiation of 2.1-27 by x1 yields PDF p(x1 | x2) 2.1-28 express joint PDFp(x1, x2) in terms of conditional PDFs 2.1-29

p(x1 ,..., xk | xk+1,..., xn ) the joint conditional PDF is 2.1-31 F(x1 ,..., xk | xk+1,..., xn ) the joint conditional CDF is 2.1-32 F (x1 ,..., xk | xk+1,..., xn )= F (x2 ,..., xk | xk+1,..., xn ) F (, x2 ,..., xk | xk+1,..., xn ) = F (-, x2 ,..., xk | xk+1,..., xn ) = 0 Multidimensional Conditional RVs Assume there are RVs given by Xi, i = 1,2,...n and integer k, 1 < k < n the joint probability of {Xi}i = 1,2,...n is p(x1,..., xn)= p(x1 ,,..., xk| xk+1,..., xn ) p(xk+1,..., xn ) 2.1-30

Statistically Independent Random Variables • assume RVs defined on sample space S are generated by either • (i) combined experiments • (ii) repeated trials of single experiment • extend idea of statistical independence for multiple events on S • Let oi= ith outcome of some experiment • assume mutually exclusive outcomes i  joi oj =  • p(oi) is independent of any p(oj) •  thus the joint probability of outcomes factors into the product of • probabilities for each outcome • p(o1, o2,..., on) = p(o1)p(o2) ... p(on)

multidimensional RVs are statistically independent if & only if F(x1, x2,..., xn) = F(x1) F(x2) ... Fxn) 2.1-33 or alternatively p(x1, x2,..., xn) = p(x1)p(x2) ... p(xn) 2.1-34 RVs corresponding to each oi are independent in the sense that their joint PDFs factor into products of marginal PDFs

(i) determine pY(y) in terms of pX(x) if mapping of X Y is one-one Y = aX + b, a > 0 (function is linear & has a monotonic mapping) Y Y=aX+b FY(y) = P(Y  y) = P(aX+b  y) X = - ö æ y b = ÷ ç F 2.1-35 X è a ø 2.1.2 Functions of Random Variables • given RVs X and Y characterized by PDFs pX(x) and pY(x) • assume Y = g(x), whereg(x) is some function of X

- ö æ 1 y b pY(y) = 2.1-36 ÷ ç p X è a a ø px(x) py(y) 1 1/a -1 0 1 x b-a b b+a y Differentiate both sides of 2.1-35 with respect to y

Y Y=aX2+b X b = = and 2.1-37 = 2.1-38 pY(y) = (ii) determine pY(y) when mapping of X Y one-one let Y = aX2 + b, a > 0 then FY(y) = P(Y  y) = P(aX2+b  y) differentiate both sides of 2.1-37 with respect to y to obtain

x1 = x2 = 2.1-40  thus PDF of Y consists of two terms Note that y = g(x) = ax2 + b has 2 real solutions General Case of g(x) = y with real roots x1, x2, ..., xn PDF of Y = g(x) expressed as pY(y) = 2.1-39 pY(y) = where roots xi, i = 1,2,...,n are functions of y and g’(x) = dg(x)/dx = 2ax

Assume fori = 1,2,...,n • there are RVs, Xi, with joint PDF given by pX(x1, x2, ...,xn) • RVs Yi ,Yi there exists a function gi() such that • Yi = gi(X1, X2, ...,Xn) • gi() is • - single-valued function with continuous partial derivatives • - and invertible  there exists g-1i() such that • Xi = g-1i(Y1, Y2, ...,Yn) • g-1i() is single-valued with continuous partial derivatives 2.1-41 2.1-42 Multidimensional Functions of RVs

then • fori = 1,2,...,n substitute xi = gi-1(y1, y2, ...,yn) for notation, let gi-1≡gi-1(y1, y2, ...,yn) • Given pX(x1, x2, ...,xn)  determine pY(y1, y2, ...,yn) • Assume • X = n-dimensional space of RVs Xi • Y = 1-1 mapping of Xdefined by functions Yi = gi(X1, X2, ...,Xn)

2.1-43 then J = Jacobian transformation defined by determinant J = Desired joint PDF of Yi given by differentation of 2.1-43 pY(y1, y2, ...,yn) = pX(x1=g1-1, x2 = g2-1, ...,xn = gn-1)|J| 2.1-44

{aij} are constants {bij}  A-1 Y = AX then using matrix notation Yi = Xi = i = 1,2,...,n i = 1,2,...,n X = A-1Y 2.1-46 J = 1/|A| pY(y1,...,yn) = e.g. if there is a linear relation between 2 sets of n-dimensional RVs 2.1-45 Joint PDF is 2.1-47

2.1.3 Statistical Averages of RVs • Averages are important for characterizing • outcomes of experiments • RV defined on sample space of experiments • Averages of specific interest include • (1) 1st & 2nd moments of single RV • (2) joint moment between two RVs in multidimensional set of RVs • - correlation • - covariance • (3) characteristic functionof single RV • (4) joint characteristic function of multidimensional set of RVs

(i) nthMoment of X given by E[X] = mx = E[Xn] = for RV Y = g(X), where g(X) is an arbitrary function of X E[Y] = E[g(X)] = 2.1-50 (1) Given single RVX with PDF p(x) 2.1-48 1stmoment of X (aka mean or expected value) = given by: 2.1-49 p(x) = PDF of X

2ndcentral moment of RV X (aka Variance ) (ii)nthCentral Moment LetY = (X-mx)n E[Y] = E[(X-mx)n] = • measures the dispersion of RV X mx = mean of RV X 2.1-52 σx2= E[(X-mx)2] = Standard Deviationσx = = = E[X2] - E[X]2 = E[X2] - mx2 2.1-51

(i) joint moment E[X1k,X2n] = E[(X1-m1)k (X2-m2)n] = 2.1-54 (ii) joint central moment where mi= E[Xi] (2) Given joint PDFp(x1,x2) of 2 RVs, X1 & X2 2.1-53

(iii) correlation of X1& X2 = joint moment with k = n = 1 2.1-55 E[X1,X2] = (iv) covariance of X1& X2= joint central moment with k = n = 1 2.1-56 E[(X1-m1) (X2-m2)] = • correlation & covariance are joint moments for k = n = 1 that are used for pairs of RVs • assume RVs Xi, i = 1,2,...,n with joint PDF p(x1,...xn) • let p(xi , xj) be joint PDF of Xi and Xj normalized indication of whether 2 RV vary in similar manner indicates whether 2 RV vary in similar manner

(i) correlation of Xi & Xj = E[Xi,Xj] = (ii) covariance of Xi & Xj≡uij = E[(Xi-mi) (Xj-mj)] uij= 2.1-57 = = E[(Xi Xj)] - mimj (iii) covariance matrix of {Xi} is nn matrix with elements uij 2.1-58 more generally for RVs {Xi}i = 1..n

For two uncorrelated RVs, Xi & Xj correlation E[Xi,Xj]= E[Xi] E[Xj] = mimj 2.1-59 covariance uij = E[(Xi) (Xj)] - mimj = 0 Given any two RVs, Xi & Xj • statistical independence ofXi & Xj implies they are uncorrelated • uncorrelatedXi & Xjare not necessarily statistically independent • Xi & Xj are orthogonal if E[Xi,Xj]= 0 2.1-60 • Xi & Xj are orthogonal when • (i) they are uncorrelated • (ii) either/both have 0-mean

2.1-61 ψ(jv) ≡ E[exp(jvX)] = • v = real variable • j = p(x) = 2.1-62 Characteristic Functions • given RV X with PDF p(x), thecharacteristic functionψ(jv) is • defined as statistical average of exp(jvX) • ψ(jv) can be described as the Fourier Transform of p(x) • inverse Fourier Transform is given by

1st derivative of ψ(jv)with respect to v gives 2.1-63 • evaluated at v = 0 yields 1st moment (mean) 2.1-64 E[X] = mx = nthderivative of ψ(jv)evaluated at v = 0 yields nthmoment E[Xn] = 2.1-65 • (1) Moments of a RV can be determined from the characteristic • equation

2.1-66 ψ(jv) = ψ(jv) = 2.1-67 • (2) Sum of the moments of a RV can be related to characteristic • equation if ψ(jv) can be expanded into Taylor Series about the point v = 0 by substitution ψ(jv) is related to sum of the moments of X

let RV Y be given as: Y = = = = • (3) Given a set of statistically independentRVs {Xi}i = 1..n with joint • PDF p(x1, x2,...., xn,) • find p(y), the PDF of Y using ψ(jv) & inverse Fourier Transform ψY(jv) = E[exp(jvY)] 2.1-68

2.1-69 ψXi(jv) = ψY(jv) = thus where 2.1-70 for statistically independent RVs  p(x1,...,xn) = p(x1)p(x2)...p(xn) • thus nthorder integral reduces to product on n single integrals • each integral i corresponds to characteristic equation of Xi ψY(jv) = =

p(y) = 2.1-71 = p(y) = if Xi ’s have identical distribution (iid) all ψXi(jv) = ψXk(jv) ψY(jv) = PDF of Y determined from IFT of ψY(jv) 2.1-72

For statistically independent RVs {Xi}i = 1..n and • ψY(jv) = product of • p(y) = n-fold convolution of p(xi) usually difficult to solve = Y = ψ(jv1, jv2,..., jvn, ) = (4) Characteristic Function for Joint RV For n-dimensional RVs {Xi}i = 1..n with joint PDF p(x1,...,xn) the n-dimensional characteristic function defined as

ψ(jv1, jv2) 2.1-74 = • correlation of Xi& Xj given by 2.1-75 E[X1,X2] = = e.g. characteristic function for n=2 • use partial derivatives of ψ(jv1, jv2) with respect to v1& v2 to • generate joint moments • higher order moments are generated in similar manner

2.1 Probability Joint & Conditional Probabilities Statistical Independence Random Variables & Distributions