400 likes | 420 Views
Chapter 2: Modeling Distributions of Data. Section 2.2 Normal Distributions. The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE. Normal Distributions. How is this done mathematically?. Symmetrical bell-shaped ( unimodal ) density curve Above the horizontal axis
E N D
Chapter 2: Modeling Distributions of Data Section 2.2 Normal Distributions The Practice of Statistics, 4th edition - For AP* STARNES, YATES, MOORE
Normal Distributions How is this done mathematically? • Symmetrical bell-shaped (unimodal) density curve • Above the horizontal axis • N(m, s) • The transition points occur at m+s • Probability is calculated by finding the area under the curve • As sincreases, the curve flattens & spreads out • As sdecreases, the curve gets taller and thinner • good descriptions for some distributions of real data
Normal distributions occur frequently. • Length of newborn child • Height • Weight • ACT or SAT scores • Intelligence • Number of typing errors • Chemical processes
Normal Distributions • Normal Distributions • One particularly important class of density curves are the Normal curves, which describe Normal distributions. • All Normal curves are symmetric, single-peaked, and bell-shaped • A Specific Normal curve is described by giving its mean µ and standard deviation σ. Two Normal curves, showing the mean µ and standard deviation σ.
s s A B 6 Do these two normal curves have the same mean? If so, what is it? Which normal curve has a standard deviation of 3? Which normal curve has a standard deviation of 1? YES B A
The 68-95-99.7 Rule - the Empirical rule Although there are many Normal curves, they all have properties in common. • Definition:The 68-95-99.7 Rule (“The Empirical Rule”) • In the Normal distribution with mean µ and standard deviation σ: • Approximately 68% of the observations fall within σ of µ. • Approximately 95% of the observations fall within 2σ of µ. • Approximately 99.7% of the observations fall within 3σ of µ.
68% 71 Suppose that the height of male students at SHS is normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. What is the probability that the height of a randomly selected male student is more than 73.5 inches? 1 - .68 = .32 P(X > 73.5) = 0.16
Purchases at a supermarket on a busy Saturday afternoon fall into a roughly normal distribution with mean $55 and standard deviation $18. On the normal curve below, mark the values for the mean and 1, 2, and 3 standard deviations above and below the mean. Then answer the questions. • How much do the middle 68% of customers purchase? • b) How much do the middle 95% of customers purchase? • c) What percentage of people spends more than $73? • d) What percentage of people spends less than $19? • e) A purchase of $91 corresponds to what percentile of spending?
Since the year 1800, records have been kept on the snowfall in the Philadelphia area which follow roughly a normal distribution. The average snowfall has been 59.9 inches with a standard deviation of 18.6 inches. On the normal curve below, mark the values for the mean and 1, 2, and 3 standard deviations above and below the mean. Then answer the questions. • What percentage of winters get less than 78.5 inches? b) What percentage of winters get more than 22.7 inches? c) How much snowfall do the middle 95% of winters get? d) How much snowfall do the middle 99.7% of winters get? e) A snowfall of 41.3 inches corresponds to what percentile of snowfall?
Example, p. 113 Normal Distributions The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). • Sketch the Normal density curve for this distribution. • What percent of ITBS vocabulary scores are less than 3.74? • What percent of the scores are between 5.29 and 9.94?
Standard Normal Distribution • A normal distribution with mean 0 and standard deviation 1, is called the standard (or standardized) normal distribution.
Normal Distributions • The Standard Normal Distribution • All Normal distributions are the same if we measure in units of size σ from the mean µ as center. Definition: The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1).
Column labeled 0.06 Row labeled 0.4 P(z < 0.46) = 0.6772 Using the Normal Tables • Find P(z < 0.46)
Column labeled 0.04 P(z < -2.74) = 0.0031 Row labeled -2.7 Using the Normal Tables • Find P(z < -2.74)
Example, p. 117 Normal Distributions • Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are between -1.25 and 0.81. Can you find the same proportion using a different approach? 1 - (0.1056+0.2090) = 1 – 0.3146 = 0.6854
P(z < 1.83) = 0.9664 (b) P(z > 1.83) = 1 – P(z < 1.83) = 1 – 0.9664 = 0.0336 Sample Calculations Using the Standard Normal Distribution Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfy each of the following:
(d) P(z > -1.83) = 1 – P(z < -1.83) = 1 – 0.0336= 0.9664 c) P(z < -1.83) = 0.0336 Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies each of the following:
Symmetry Property • Notice from the preceding examples it becomes obvious that • P(z > z*) = P(z < -z*) P(z > -2.18) = P(z < 2.18) = 0.9854
P(Z<2.34)=0.9904 P(Z<-1.37)=0.0853 P(-1.37 < z < 2.34) = 0.9904 - 0.0853 = 0.9051 Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies -1.37 < z < 2.34, that is find P(-1.37 < z < 2.34).
P(Z<1.61)=0.9463 P(Z<.54)=0.7054 P(0.54 < z < 1.61) = 0.9463 - 0.7054 = 0.2409 Using the standard normal tables, find the proportion of observations (z values) from a standard normal distribution that satisfies 0.54 < z < 1.61, that is find P(0.54 < z < 1.61).
Using the standard normal tables, in each of the following, find the z values that satisfy : (a) The point z with 98% of the observations falling below it. The closest entry in the table to 0.9800 is 0.9798 corresponding to a z value of 2.05
Using the standard normal tables, in each of the following, find the z values that satisfy : (b) The point z with 90% of the observations falling above it. The closest entry in the table to 0.1000 is 0.1003 corresponding to a z value of -1.28
Will my calculator do any of this normal stuff? • Normalpdf– use for graphing ONLY • Normalcdf – will find probability of area from lower bound to upper bound • Invnorm (inverse normal) – will find z-score for probability
Strategies for finding probabilities or proportions in normal distributions • State the probability statement • Draw a picture • Calculate the z-score • Look up the probability (proportion) in the table
Normalcdf(-999, 220, 200, 15)= .9088 The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last less than 220 hours? Write the probability statement Draw & shade the curve P(X < 220) = .9082 Look up z-score in table or calculator Calculate z-score
The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours? P(X>220) = 1 - .9082 = .0918 Normalcdf(220, 999, 200, 15)= .0912
The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. How long must a battery last to be in the top 5%? Look up in table 0.95 to find z- score P(X > ?) = .05 .95 .05 1.645 invNorm(.95, 200, 15) = 224.673
The heights of the female students at SHS are normally distributed with a mean of 65 inches. What is the standard deviation of this distribution if 18.5% of the female students are shorter than 63 inches? What is the z-score for the 63? P(X < 63) = .185 -0.9 63
When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger’s drives travel between 305 and 325 yards? Normal Distributions • Normal Distribution Calculations Normalcdf(.13, 2.63, 0, 1)= .4440 Normalcdf(305, 325, 304, 8)= .4459 Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13. 0.9957 – 0.5517 = 0.4440. About 44% of Tiger’s drives travel between 305 and 325 yards.
Ways to Assess Normality Use graphs (dotplots, boxplots, or histograms) Normal probability (quantile) plot
Normal Distributions • Assessing Normality • The Normal distributions provide good models for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed. Consequently, we need a strategy for assessing Normality. • Plot the data. • Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. • Check whether the data follow the 68-95-99.7 rule. • Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to the 68%, 95%, and 99.7% targets for a Normal distribution.
Normal Probability (Quantile) plots The observation (x) is plotted against known normal z-scores If the points on the quantile plot lie close to a straight line, then the data is normally distributed Deviations on the quantile plot indicate nonnormal data Points far away from the plot indicate outliers Vertical stacks of points (repeated observations of the same number) is calledgranularity
Why are these regions not the same width? Consider a random sample with n = 5. To find the appropriate z-scores for a sample of size 5, divide the standard normal curve into 5 equal-area regions.
These would be the z-scores (from the standard normal curve) that we would use to plot our data against. Why is the median not in the “middle” of each region? Consider a random sample with n = 5. Next – find the median z-score for each region. -1.28 0 1.28 -.524 .524
Normal Scores Widths of Contact Windows Suppose we have the following observations of widths of contact windows in integrated circuit chips: 3.21 2.49 2.94 4.38 4.02 3.62 3.30 2.85 3.34 3.81 Sketch a scatterplot by pairing the smallest normal score with the smallest observation from the data set & so on What should happen if our data set is normally distributed? Let’s construct a normal probability plot. The values of the normal scores depend on the sample size n. The normal scores when n = 10 are below: -1.539 -1.001 -0.656 -0.376 -0.123 0.123 0.376 0.656 1.001 1.539
Are these approximately normally distributed? 50 48 54 47 51 52 46 53 52 51 48 48 54 55 57 45 53 50 47 49 50 56 53 52 What is this called? The normal probability plot is approximately linear, so these data are approximately normal. Both the histogram & boxplot are approximately symmetrical, so these data are approximately normal.
Normal Distributions • Normal Probability Plots • Most software packages can construct Normal probability plots. These plots are constructed by plotting each observation in a data set against its corresponding percentile’s z-score. Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.