310 likes | 328 Views
Learn about the Normal Distribution, a key concept in statistics, and how to calculate probabilities using Z-scores and standard deviations. Discover how to transform any normal distribution into a standard normal distribution for easier analysis.
E N D
4.1 - Probability Density Functions • 4.2 - Cumulative Distribution Functions and • Expected Values • 4.3 - The Normal Distribution • 4.4 - The Exponential and Gamma Distributions • 4.5 - Other Continuous Distributions • 4.6 - Probability Plots Chapter 4Continuous Random Variables and Probability Distributions
~ The Normal Distribution ~(a.k.a. “The Bell Curve”) • Symmetric, unimodal • Models many (but not all) natural systems • Mathematical properties make it useful to work with standard deviation X ~ N(μ, σ) Johann Carl Friedrich Gauss1777-1855 σ X μ mean
Standard Normal Distribution Z ~ N(0, 1) SPECIAL CASE Total Area = 1 1 Z The cumulative distribution function(cdf) is denoted by (z). It is not expressible in explicit, closed form, but is tabulated, and computable in R via the command pnorm.
Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z 1.2). Total Area = 1 1 Z 1.2 “z-score”
Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z 1.2). • Use the included table. Total Area = 1 1 Z 1.2 “z-score”
Example Standard Normal Distribution Z ~ N(0, 1) Find (1.2) = P(Z 1.2). • Use the included table. • Use R: • > pnorm(1.2) • [1] 0.8849303 Total Area = 1 1 0.88493 P(Z> 1.2) 0.11507 Z 1.2 “z-score” Note: Because this is a continuous distribution, P(Z = 1.2) = 0, so there is no difference between P(Z> 1.2) and P(Z 1.2), etc.
X ~ N(μ, σ) Standard Normal Distribution Z ~ N(0, 1) σ μ 1 Z Why be concerned about this, when most “bell curves” don’t have mean = 0, and standard deviation = 1? Any normal distribution can be transformed to the standard normal distribution via a simple change of variable.
Example POPULATION Question: What proportion of the population had their first child before the age of 27.2 years old? Random Variable X = Age at first birth P(X < 27.2) = ? Year 2010 X ~ N(25.4, 1.5) σ= 1.5 μ= 25.4 27.2
Example POPULATION Question: What proportion of the population had their first child before the age of 27.2 years old? Random Variable X = Age at first birth P(X < 27.2) = ? The x-score = 27.2 must first be transformed to a corresponding z-score. Year 2010 X ~ N(25.4, 1.5) σ= 1.5 μ= μ= 25.4 μ= 25.4 27.2 33
Example POPULATION Question: What proportion of the population had their first child before the age of 27.2 years old? Random Variable X = Age at first birth P(Z < 1.2) = P(X < 27.2) = ? 0.88493 Year 2010 X ~ N(25.4, 1.5) σ= 1.5 • Using R: • > pnorm(27.2, 25.4, 1.5) • [1] 0.8849303 μ= μ= 25.4 27.2 33
Standard Normal Distribution Z ~ N(0, 1) 1 Z What symmetric interval about the mean 0 contains 95% of the population values? That is…
Standard Normal Distribution Z ~ N(0, 1) • Use the included table. 0.95 0.025 0.025 Z -z.025 = ? +z.025 = ? What symmetric interval about the mean 0 contains 95% of the population values? That is…
Standard Normal Distribution Z ~ N(0, 1) • Use the included table. • Use R: • > qnorm(.025) • [1] -1.959964 • > qnorm(.975) • [1] 1.959964 0.95 0.025 0.025 Z -z.025 = ? “.025 critical values” +z.025 = +1.96 +z.025 = ? -z.025 = -1.96 What symmetric interval about the mean 0 contains 95% of the population values?
Standard Normal Distribution Z ~ N(0, 1) X ~ N(μ, σ) X ~ N(25.4, 1.5) What symmetric interval about the mean age of 25.4 contains 95% of the population values? 22.46 X 28.34 yrs > areas = c(.025, .975) > qnorm(areas, 25.4, 1.5) [1] 22.46005 28.33995 0.95 0.025 0.025 Z -z.025 = ? “.025 critical values” +z.025 = +1.96 +z.025 = ? -z.025 = -1.96 What symmetric interval about the mean 0 contains 95% of the population values?
Standard Normal Distribution Z ~ N(0, 1) • Use the included table. 0.90 0.05 0.05 Z -z.05 = ? +z.05 = ? Similarly… What symmetric interval about the mean 0 contains 90% of the population values?
…so average 1.64 and 1.65 0.95 average of 0.94950 and 0.95053…
Standard Normal Distribution Z ~ N(0, 1) • Use the included table. • Use R: • > qnorm(.05) • [1] -1.644854 • > qnorm(.95) • [1] 1.644854 0.90 0.05 0.05 Z -z.05 = ? -z.05 = -1.645 +z.05 = +1.645 +z.05 = ? “.05 critical values” Similarly… What symmetric interval about the mean 0 contains 90% of the population values?
Standard Normal Distribution Z ~ N(0, 1) In general…. 1 – 0.90 0.05 / 2 / 2 0.05 Z -z / 2 -z.05 = ? -z.05 = -1.645 +z / 2 +z.05 = +1.645 +z.05 = ? “ / 2 critical values” “.05 critical values” Similarly… What symmetric interval about the mean 0 contains 100(1 – )% of the population values?
continuous discrete Normal Approximation to the Binomial Distribution Suppose a certain outcome exists in a population, with constant probability. We will randomly select a random sample of n individuals, so that the binary “Success vs. Failure” outcome of any individual is independent of the binary outcome of any other individual, i.e., nBernoulli trials (e.g., coin tosses). Discrete random variable X = # Successes in sample (0, 1, 2, 3, …,, n) Discrete random variable X = # Successes in sample (0, 1, 2, 3, …,, n) P(Success) = P(Failure) = 1 – Then X is said to follow a Binomial distribution, written X ~ Bin(n, ), with “probability function” p(x) = , x = 0, 1, 2, …, n.
> dbinom(10, 100, .2) [1] 0.00336282 Area
> pbinom(10, 100, .2) [1] 0.005696381 Area
Therefore, if… • X ~ Bin(n, ) with n 15 and n (1 – ) 15, • then… That is… “Sampling Distribution” of
Classical Continuous Probability Distributions • Normal distribution • Log-Normal ~ X is not normally distributed (e.g., skewed), butY = “logarithm of X” is normally distributed • Student’s t-distribution ~ Similar to normal distr, more flexible • F-distribution ~ Used when comparing multiple group means • Chi-squared distribution ~ Used extensively in categoricaldata analysis • Others for specialized applications ~ Gamma, Beta, Weibull…