The Central Limit Theorem

The Central Limit Theorem Paul CornwellMarch 31, 2011

Statement of Theorem • Let X1,…,Xn be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large.

Questions • How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good? • What is a ‘good’ approximation?

Importance • Permits analysis of random variables even when underlying distribution is unknown • Estimating parameters • Hypothesis Testing • Polling

Testing for Normality • Performing a hypothesis test to determine if set of data came from normal • Considerations • Power: probability that a test will reject the null hypothesis when it is false • Ease of Use

Testing for Normality • Problems • No test is desirable in every situation (no universally most powerful test) • Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal) • The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected

Characteristics of Distribution • Symmetric • Unimodal • Bell-shaped • Continuous

Closeness to Normal • Skewness: Measures the asymmetry of a distribution. • Defined as the third standardized moment • Skew of normal distribution is 0

Closeness to Normal • Kurtosis: Measures peakedness or heaviness of the tails. • Defined as the fourth standardized moment • Kurtosis of normal distribution is 3

Binomial Distribution • Cumulative distribution function:

Binomial Distribution* *from R

Uniform Distribution • Cumulative distribution function:

Uniform Distribution* *from R

Exponential Distribution • Cumulative distribution function:

Exponential Distribution* *from R

For Next Time… • Find n values for more distributions • Refine criteria for quality of approximation • Explore meanless distributions • Classify distributions in order to have more general guidelines for minimum sample size

The Central Limit Theorem (Pt 2) Paul CornwellMay 2, 2011

Review • Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases • Rate of converge depends on underlying distribution • What sample size is needed to produce a good approximation from the CLT?

Questions • Real-life applications of the Central Limit Theorem • What does kurtosis tell us about a distribution? • What is the rationale for requiring np ≥ 5? • What about distributions with no mean?

Applications of Theorem • Probability for total distance covered in a random walk tends towards normal • Hypothesis testing • Confidence intervals (polling) • Signal processing, noise cancellation

Kurtosis • Measures the “peakedness” of a distribution • Higher peaks means fatter tails

Why np? • Traditional assumption for normality with binomial is np > 5 or 10 • Skewness of binomial distribution increases as p moves away from .5 • Larger n is required for convergence for skewed distributions

Cauchy Distribution • Has no moments (including mean, variance) • Distribution of averages looks like regular distribution • CLT does not apply

Beta Distribution • α = β = 1/3 • Distribution is symmetric and bimodal • Convergence to normal is fast in averages

Student’s t Distribution • Heavier-tailed, bell-shaped curve • Approaches normal distribution as degrees of freedom increase

Criteria • 4 statistics: K-S distance, tail probabilities, skewness and kurtosis • Different thresholds for “adequate” and “superior” approximations • Both are fairly conservative

Adequate Approximation

Stronger Approximation

Conclusions • Skewness is difficult to shake • Tail probabilities are fairly accurate for small sample sizes • Traditional recommendation is small for many common distributions

The Central Limit Theorem