Brief Review

Brief Review Probability and Statistics

Probability distributions Continuous distributions

Defn (density function) Let x denote a continuous random variable then f(x) is called the density function of x 1) f(x) ≥ 0 2) 3)

Defn (Joint density function) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables then f(x) = f(x1 ,x2 ,x3 , ... , xn) is called the joint density function of x = (x1 ,x2 ,x3 , ... , xn) if 1) f(x) ≥ 0 2) 3)

Note:

Defn (Marginal density function) The marginal density of x1 = (x1 ,x2 ,x3 , ... , xp) (p < n) is defined by: f1(x1) = = where x2 = (xp+1 ,xp+2 ,xp+3 , ... , xn) The marginal density of x2 = (xp+1 ,xp+2 ,xp+3 , ... , xn) is defined by: f2(x2) = = where x1 = (x1 ,x2 ,x3 , ... , xp)

Defn (Conditional density function) The conditional density of x1 given x2 (defined in previous slide) (p < n) is defined by: f1|2(x1 |x2) = conditional density of x2 given x1 is defined by: f2|1(x2 |x1)=

Marginal densities describe how the subvector xi behaves ignoring xj Conditional densities describe how the subvector xi behaves when the subvector xj is held fixed

Defn (Independence) The two sub-vectors (x1 and x2) are called independent if: f(x) = f(x1, x2) = f1(x1)f2(x2) = product of marginals or the conditional density of xi given xj : fi|j(xi |xj) = fi(xi) = marginal density of xi

Example (p-variate Normal) The random vector x (p× 1) is said to have the p-variate Normal distribution with mean vector m(p× 1) and covariance matrix S(p×p) (written x ~ Np(m,S)) if:

Example (bivariate Normal) The random vector is said to have the bivariate Normal distribution with mean vector and covariance matrix

Theorem (Transformations) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x1 ,x2 ,x3 , ... , xn) = f(x). Let y1 =f1(x1 ,x2 ,x3 , ... , xn) y2 =f2(x1 ,x2 ,x3 , ... , xn) ... yn =fn(x1 ,x2 ,x3 , ... , xn) define a 1-1 transformation of x into y.

Then the joint density of y is g(y) given by: g(y) = f(x)|J| where = the Jacobian of the transformation

Corollary (Linear Transformations) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x1 ,x2 ,x3 , ... , xn) = f(x). Let y1 = a11x1 + a12x2 + a13x3 , ... + a1nxn y2 = a21x1 + a22x2 + a23x3 , ... + a2nxn ... yn = an1x1 + an2x2 + an3x3 , ... + annxn define a 1-1 transformation of x into y.

Then the joint density of y is g(y) given by:

Corollary (Linear Transformations for Normal Random variables) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables having an n-variate Normal distribution with mean vector m and covariance matrix S. i.e. x ~ Nn(m, S) Let y1 = a11x1 + a12x2 + a13x3 , ... + a1nxn y2 = a21x1 + a22x2 + a23x3 , ... + a2nxn ... yn = an1x1 + an2x2 + an3x3 , ... + annxn define a 1-1 transformation of x into y. Then y = (y1 ,y2 ,y3 , ... , yn) ~ Nn(Am,ASA')

Defn (Expectation) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x) = f(x1 ,x2 ,x3 , ... , xn). Let U = h(x)= h(x1 ,x2 ,x3 , ... , xn) Then

Defn (Conditional Expectation) Let x = (x1 ,x2 ,x3 , ... , xn) = (x1 , x2 ) denote a vector of continuous random variables with joint density function f(x) = f(x1 ,x2 ,x3 , ... , xn) = f(x1 , x2 ). Let U = h(x1)= h(x1 ,x2 ,x3 , ... , xp) Then the conditional expectation of U given x2

Defn (Variance) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x) = f(x1 ,x2 ,x3 , ... , xn). Let U = h(x)= h(x1 ,x2 ,x3 , ... , xn) Then

Defn (Conditional Variance) Let x = (x1 ,x2 ,x3 , ... , xn) = (x1 , x2 ) denote a vector of continuous random variables with joint density function f(x) = f(x1 ,x2 ,x3 , ... , xn) = f(x1 , x2 ). Let U = h(x1)= h(x1 ,x2 ,x3 , ... , xp) Then the conditional variance of U given x2

Defn (Covariance, Correlation) Let x = (x1 ,x2 ,x3 , ... , xn) denote a vector of continuous random variables with joint density function f(x) = f(x1 ,x2 ,x3 , ... , xn). Let U = h(x)= h(x1 ,x2 ,x3 , ... , xn) and V = g(x)=g(x1 ,x2 ,x3 , ... , xn) Then the covariance of U and V.

Properties • Expectation • Variance • Covariance • Correlation

E[a1x1 + a2x2 + a3x3 + ... + anxn] = a1E[x1] + a2E[x2] + a3E[x3] + ... + anE[xn] or E[a'x] = a'E[x]

E[UV] = E[h(x1)g(x2)] = E[U]E[V] = E[h(x1)]E[g(x2)] if x1 and x2 are independent

Var[a1x1 + a2x2 + a3x3 + ... + anxn] or Var[a'x] = a′Sa

Cov[a1x1 + a2x2 + ... + anxn , b1x1 + b2x2 + ... + bnxn] or Cov[a'x, b'x] = a′Sb

Statistical Inference Making decisions from data

There are two main areas of Statistical Inference • Estimation – deciding on the value of a parameter • Point estimation • Confidence Interval, Confidence region Estimation • Hypothesis testing • Deciding if a statement (hypotheisis) about a parameter is True or False

The general statistical modelMost data fits this situation

Defn (The Classical Statistical Model) The data vector x = (x1 ,x2 ,x3 , ... , xn) The model Let f(x|q) = f(x1 ,x2 , ... , xn|q1 , q2 ,... , qp) denote the joint density of the data vector x = (x1 ,x2 ,x3 , ... , xn) of observations where the unknown parameter vector qW (a subset of p-dimensional space).

An Example The data vector x = (x1 ,x2 ,x3 , ... , xn) a sample from the normal distribution with mean m and variance s2 The model Then f(x|m , s2) = f(x1 ,x2 , ... , xn|m , s2), the joint density of x = (x1 ,x2 ,x3 , ... , xn) takes on the form: where the unknown parameter vector q = (m , s2) W ={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

Defn (Sufficient Statistics) Let x have joint density f(x|q) where the unknown parameter vector qW. Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is called a set of sufficient statistics for the parameter vector q if the conditional distribution of x given S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is not functionally dependent on the parameter vector q. A set of sufficient statistics contains all of the information concerning the unknown parameter vector

A Simple Example illustrating Sufficiency Suppose that we observe a Success-Failure experiment n = 3 times. Let q denote the probability of Success. Suppose that the data that is collected is x1, x2, x3 where xi takes on the value 1 is the ith trial is a Success and 0 if the ith trial is a Failure.

The following table gives possible values of (x1, x2, x3). The data can be generated in two equivalent ways: • Generating (x1, x2, x3) directly from f (x1, x2, x3|q) or • Generating S from g(S|q) then generating(x1, x2, x3) from f (x1, x2, x3|S). Since the second step does involve q, no additional information will be obtained by knowing (x1, x2, x3)once S is determined

The Sufficiency Principle Any decision regarding the parameter qshould be based on a set of Sufficient statistics S1(x), S2(x), ...,Sk(x) and not otherwise on the value of x.

A useful approach in developing a statistical procedure • Find sufficient statistics • Develop estimators , tests of hypotheses etc. using only these statistics

Defn (Minimal Sufficient Statistics) Let x have joint density f(x|q) where the unknown parameter vector qW. Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Minimal Sufficient statistics for the parameter vector qif S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics and can be calculated from any other set of Sufficient statistics.

Theorem (The Factorization Criterion) Let x have joint density f(x|q) where the unknown parameter vector qW. Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics for the parameter vector qif f(x|q) = h(x)g(S,q) = h(x)g(S1(x) ,S2(x) ,S3(x) , ... , Sk(x),q). This is useful for finding Sufficient statistics i.e. If you can factor out q-dependence with a set of statistics then these statistics are a set of Sufficient statistics

Defn (Completeness) Let x have joint density f(x|q) where the unknown parameter vector qW. Then S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Complete Sufficient statistics for the parameter vector qif S = (S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) is a set of Sufficient statistics and whenever E[f(S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) ] = 0 then P[f(S1(x) ,S2(x) ,S3(x) , ... , Sk(x)) = 0] = 1

Defn (The Exponential Family) Let x have joint density f(x|q)| where the unknown parameter vector qW. Then f(x|q) is said to be a member of the exponential family of distributions if: qW,where

- ∞ < ai < bi < ∞ are not dependent on q. 2) W contains a nondegenerate k-dimensional rectangle. 3) g(q), ai,bi and pi(q) are not dependent on x. 4) h(x), ai ,bi and Si(x) are not dependent on q.

If in addition. 5) The Si(x) are functionally independent for i = 1, 2,..., k. 6) [Si(x)]/ xj exists and is continuous for all i = 1, 2,..., k j = 1, 2,..., n. 7) pi(q) is a continuous function of qfor all i = 1, 2,..., k. 8) R = {[p1(q),p2(q), ...,pK(q)] | qW,} contains nondegenerate k-dimensional rectangle. Then the set of statistics S1(x), S2(x), ...,Sk(x) form a Minimal Complete set of Sufficient statistics.

Defn (The Likelihood function) Let x have joint density f(x|q) where the unkown parameter vector qW. Then for a given value of the observation vector x ,the Likelihood function, Lx(q), is defined by: Lx(q) = f(x|q) with qW The log Likelihood functionlx(q) is defined by: lx(q) =lnLx(q) = lnf(x|q) with qW

The Likelihood Principle Any decision regarding the parameter qshould be based on the likelihood function Lx(q) and not otherwise on the value of x. If two data sets result in the same likelihood function the decision regarding q should be the same.

Some statisticians find it useful to plot the likelihood function Lx(q) given the value of x. It summarizes the information contained in x regarding the parameter vector q.

An Example The data vector x = (x1 ,x2 ,x3 , ... , xn) a sample from the normal distribution with mean m and variance s2 The joint distribution of x Then f(x|m , s2) = f(x1 ,x2 , ... , xn|m , s2), the joint density of x = (x1 ,x2 ,x3 , ... , xn) takes on the form: where the unknown parameter vector q = (m , s2) W ={(x,y)|-∞ < x < ∞ , 0 ≤ y < ∞}.

Brief Review

Brief Review

Presentation Transcript

Brief Review from yesterday

A Brief Review of…

Middle East Brief review

Brief Grammar Review

ANN Basics : Brief Review

Brief Review

Reporting – Brief Review

Brief Career Review

A brief review

Brief Review

A brief review

Logis Group brief review

Brief Review of Security

Brief Review (Quantum Chemistry)

Brief Review

Brief Review

Java Brief Review

Brief Review: Vectors

Brief Review

Course Review in Brief