Basic Results in Probability and Statistics

Basic Results in Probability and Statistics KNNL – Appendix A

A.1 Summation and Product Operators

A.2 Probability

A.3 Random Variables (Univariate)

A.3 Random Variables (Bivariate)

A.3 Covariance, Correlation, Independence

Linear Functions of RVs

Central Limit Theorem • When random samples of size n are selected from any population with mean m and finite variance s2, the sampling distribution of the sample mean will be approximately normally distributed for large n: Z-table can be used to approximate probabilities of ranges of values for sample means, as well as percentiles of their sampling distribution

Normal (Gaussian) Distribution • Bell-shaped distribution with tendency for individuals to clump around the group median/mean • Used to model many biological phenomena • Many estimators have approximate normal sampling distributions (see Central Limit Theorem) • Notation: Y~N(m,s2) where m is mean and s2 is variance Obtaining Probabilities in EXCEL: To obtain: F(y)=P(Y≤y) Use Function: =NORMDIST(y,m,s,1) Table B.1 (p. 1316) gives the cdf for standardized normal random variables: z=(y-m)/s ~ N(0,1) for values of z ≥ 0 (obtain tail probabilities by complements and symmetry)

Normal Distribution – Density Functions (pdf)

Second Decimal Place of z Integer part and first decimal place of z

Chi-Square Distribution • Indexed by “degrees of freedom (n)” X~cn2 • Z~N(0,1)  Z2 ~c12 • Assuming Independence: Obtaining Probabilities in EXCEL: To obtain: 1-F(x)=P(X≥x) Use Function: =CHIDIST(x,n) Table B.3, p. 1319 Gives percentiles of c2 distributions: P{c2(n) ≤ c2(A;n)} = A

Chi-Square Distributions

Critical Values for Chi-Square Distributions (Mean=n, Variance=2n)

Student’s t-Distribution • Indexed by “degrees of freedom (n)” X~tn • Z~N(0,1), X~cn2 • Assuming Independence of Z and X: Obtaining Probabilities in EXCEL:To obtain: 1-F(t)=P(T≥t) Use Function: =TDIST(t,n) Table B.2 pp. 1317-1318 gives percentiles of the t-distribution: P{t(n) ≤ t(A;n)} = A for A > 0.5 for A < 0.5: P{t(n) ≤ -t(A;n)} = 1-A

Critical Values for Student’s t-Distributions (Mean=n, Variance=2n)

F-Distribution • Indexed by 2 “degrees of freedom (n1,n2)” W~Fn1,n2 • X1 ~cn12, X2 ~cn22 • Assuming Independence of X1 and X2: Obtaining Probabilities in EXCEL: To obtain: 1-F(w)=P(W≥w) Use Function: =FDIST(w,n1,n2) Table B.4 pp.1320-1326 gives percentiles of F-distribution: P{F(n1,n2) ≤ F(A;n1,n2)} = A For values of A > 0.5 For values of A < 0.5 (lower tail probabilities): F(A;n1,n2) = 1/ F(A;n1,n2)

Critical Values for F-distributions P(F ≤ Table Value) = 0.95

A.5 Statistical Estimation - Properties Note: If an estimator is unbiased (easy to show) and its variance goes to zero as its sample size gets infinitely large (easy to show), it is consistent. It is tougher to show that it is Minimum Variance, but general results have been obtained in many standard cases.

A.5 Maximum Likelihood and Least Squares

One-Sample Confidence Interval for m • SRS from a population with mean m is obtained. • Sample mean, sample standard deviation are obtained • Degrees of freedom are df= n-1, and confidence level (1-a) are selected • Level (1-a) confidence interval of form: Procedure is theoretically derived based on normally distributed data, but has been found to work well regardless for moderate to large n

1-Sample t-test (2-tailed alternative) • 2-sided Test: H0: m = m0Ha: mm0 • Decision Rule : • Conclude m>m0 if Test Statistic (t*) > t(1-a/2;n-1) • Conclude m<m0 if Test Statistic (t*) <- t(1-a/2;n-1) • Do not conclude Conclude mm0 otherwise • P-value: 2P(t(n-1) |t*|) • Test Statistic: See Table A.1, p. 1307 for decision rules on 1-sided tests

Comparing 2 Means - Independent Samples • Observed individuals from the 2 groups are samples from distinct populations (identified by (m1,s12) and (m2,s22)) • Measurements across groups are independent • Summary statistics obtained from the 2 groups:

Sampling Distribution of • Underlying distributions normal  sampling distribution is normal, and resulting t-distribution with estimated std. dev. • Mean, variance, standard error (Std. Dev. of estimator)

Inference for m1-m2 - Normal Populations – Equal variances • Interpretation (at the a significance level): • If interval contains 0, do not reject H0: m1 = m2 • If interval is strictly positive, conclude that m1 > m2 • If interval is strictly negative, conclude that m1 < m2

Sampling Distribution of s2 (Normal Data) • Population variance (s2) is a fixed (unknown) parameter based on the population of measurements • Sample variance (s2) varies from sample to sample (just as sample mean does) • When Y~N(m,s2), the distribution of (a multiple of) s2 is Chi-Square with n-1 degrees of freedom. • (n-1)s2/s2 ~ c2 with df=n-1

(1-a)100% Confidence Interval for s2 (or s) • Step 1: Obtain a random sample of n items from the population, compute s2 • Step 2: Obtain c2L = and c2U from table of critical values for chi-square distribution with n-1 df • Step 3: Compute the confidence interval for s2 based on the formula below and take square roots of bounds for s2 to obtain confidence interval for s

Statistical Test for s2 • Null and alternative hypotheses • 1-sided (upper tail): H0: s2 s02Ha: s2> s02 • 1-sided (lower tail): H0: s2 s02Ha: s2< s02 • 2-sided: H0: s2= s02Ha: s2 s02 • Test Statistic • Decision Rule based on chi-square distribution w/ df=n-1: • 1-sided (upper tail): Reject H0 if cobs2 > cU2 = c2(1-a;n-1) • 1-sided (lower tail): Reject H0 if cobs2 < cL2 = c2(a;n-1) • 2-sided: Reject H0 if cobs2 < cL2 = c2(a/2;n-1)(Conclude s2< s02) or if cobs2 > cU2 = c2(1-a/2;n-1) (Conclude s2> s02)

Inferences Regarding 2 Population Variances • Goal: Compare variances between 2 populations • Parameter: (Ratio is 1 when variances are equal) • Estimator: (Ratio of sample variances) • Distribution of (multiple) of estimator (Normal Data): F-distribution with parameters df1 = n1-1 and df2 = n2-1

Test Comparing Two Population Variances • Assumption: the 2 populations are normally distributed

(1-a)100% Confidence Interval for s12/s22 • Obtain ratio of sample variances s12/s22 = (s1/s2)2 • Choose a, and obtain: • FL = F(a/2, n1-1, n2-1) = 1/ F(1-a/2, n2-1, n1-1) • FU = F(1-a/2, n1-1, n2-1) • Compute Confidence Interval: Conclude population variances unequal if interval does not contain 1

Basic Results in Probability and Statistics

Basic Results in Probability and Statistics

Presentation Transcript

Probability and Statistics

Review of Basic Probability and Statistics

Probability and statistics

Probability and Statistics

Probability and Statistics

Statistics and Probability

Probability and Statistics

Probability and Statistics

Chapter 3 Basic Concepts in Statistics and Probability

Chapter 3 Basic Concepts in Statistics and Probability

PROBABILITY AND STATISTICS

Probability and Statistics

Statistics I: Basic Probability

Basic Probability and Statistics

Probability and Statistics

Probability and Statistics

Probability and Statistics

Statistics and Probability 13.1 Basic Statistics

Chapter 3 Basic Concepts in Statistics and Probability

Probability and Statistics

PROBABILITY AND STATISTICS

STATISTICS AND PROBABILITY