350 likes | 563 Views
The Central Limit Theorem. Paul Cornwell March 31, 2011. Statement of Theorem.
E N D
The Central Limit Theorem Paul CornwellMarch 31, 2011
Statement of Theorem • Let X1,…,Xn be independent, identically distributed random variables with positive variance. Averages of these variables will be approximately normally distributed with mean μ and standard deviation σ/√n when n is large.
Questions • How large of a sample size is required for the Central Limit Theorem (CLT) approximation to be good? • What is a ‘good’ approximation?
Importance • Permits analysis of random variables even when underlying distribution is unknown • Estimating parameters • Hypothesis Testing • Polling
Testing for Normality • Performing a hypothesis test to determine if set of data came from normal • Considerations • Power: probability that a test will reject the null hypothesis when it is false • Ease of Use
Testing for Normality • Problems • No test is desirable in every situation (no universally most powerful test) • Some lack ability to verify for composite hypothesis of normality (i.e. nonstandard normal) • The reliability of tests is sensitive to sample size; with enough data, null hypothesis will be rejected
Characteristics of Distribution • Symmetric • Unimodal • Bell-shaped • Continuous
Closeness to Normal • Skewness: Measures the asymmetry of a distribution. • Defined as the third standardized moment • Skew of normal distribution is 0
Closeness to Normal • Kurtosis: Measures peakedness or heaviness of the tails. • Defined as the fourth standardized moment • Kurtosis of normal distribution is 3
Binomial Distribution • Cumulative distribution function:
Binomial Distribution* *from R
Uniform Distribution • Cumulative distribution function:
Uniform Distribution* *from R
Exponential Distribution • Cumulative distribution function:
Exponential Distribution* *from R
For Next Time… • Find n values for more distributions • Refine criteria for quality of approximation • Explore meanless distributions • Classify distributions in order to have more general guidelines for minimum sample size
The Central Limit Theorem (Pt 2) Paul CornwellMay 2, 2011
Review • Central Limit Theorem: Averages of i.i.d. variables become normally distributed as sample size increases • Rate of converge depends on underlying distribution • What sample size is needed to produce a good approximation from the CLT?
Questions • Real-life applications of the Central Limit Theorem • What does kurtosis tell us about a distribution? • What is the rationale for requiring np ≥ 5? • What about distributions with no mean?
Applications of Theorem • Probability for total distance covered in a random walk tends towards normal • Hypothesis testing • Confidence intervals (polling) • Signal processing, noise cancellation
Kurtosis • Measures the “peakedness” of a distribution • Higher peaks means fatter tails
Why np? • Traditional assumption for normality with binomial is np > 5 or 10 • Skewness of binomial distribution increases as p moves away from .5 • Larger n is required for convergence for skewed distributions
Cauchy Distribution • Has no moments (including mean, variance) • Distribution of averages looks like regular distribution • CLT does not apply
Beta Distribution • α = β = 1/3 • Distribution is symmetric and bimodal • Convergence to normal is fast in averages
Student’s t Distribution • Heavier-tailed, bell-shaped curve • Approaches normal distribution as degrees of freedom increase
Criteria • 4 statistics: K-S distance, tail probabilities, skewness and kurtosis • Different thresholds for “adequate” and “superior” approximations • Both are fairly conservative
Conclusions • Skewness is difficult to shake • Tail probabilities are fairly accurate for small sample sizes • Traditional recommendation is small for many common distributions