660 likes | 1.36k Views
Chapter 6: What’s Normal?. Stats in Your World BOCK & MARIANO. Ch. 6 What’s Normal?. Learning Objectives. After this chapter, you should be able to… MEASURE position using percentiles INTERPRET cumulative relative frequency graphs MEASURE position using z -scores TRANSFORM data
E N D
Chapter 6: What’s Normal? Stats in Your World BOCK & MARIANO
Ch. 6What’s Normal? Learning Objectives After this chapter, you should be able to… • MEASURE position using percentiles • INTERPRET cumulative relative frequency graphs • MEASURE position using z-scores • TRANSFORM data • DEFINE and DESCRIBE density curves
Ch. 6What’s Normal Learning Objectives (cont’d) • DESCRIBE and APPLY the 68-95-99.7 Rule • DESCRIBE the standard Normal Distribution • PERFORM Normal distribution calculations • ASSESS Normality
Describing Location in a Distribution • Measuring Position: Percentiles • One way to describe the location of a value in a distribution is to tell what percent of observations are less than it. Definition: The pth percentile of a distribution is the value with p percent of the observations less than it. Example Deja earned a score of 86 on her test. How did she perform relative to the rest of the class? 6 7 7 2334 7 5777899 8 00123334 8 569 9 03 6 7 7 2334 7 5777899 8 00123334 8 569 9 03 If two observations have the same value, they will be at the same percentile. To find the percentile, calculate the percent of the values in the distribution that are below both values. Her score was greater than 21 of the 25 observations. Since 21 of the 25, or 84%, of the scores are below hers, Deja is at the 84th percentile in the class’s test score distribution. Percentiles should be whole numbers, so if you get a decimal, you should round to the nearest integer. What percentile is the person who earned a 72? What is the percentile is the person who earned a 93? What is the percentile of the two students who earned an 80?
Describing Location in a Distribution • Measuring Position: Percentiles Just knowing that Deja is 6 points above average doesn’t tell you much about her location in the distribution. Depending on the spread of the distribution, Deja might be just barely above average or really far above average. This is why we need to incorporate a measure of spread (standard deviation) to have a really good understanding of how far above the mean she is.
Describing Location in a Distribution • Measuring Position: Percentiles • The stemplot below shows the number of wins for each of the 30 Major League Baseball teams in 2009. • Find the percentiles for the following teams: • The Colorado Rockies, who won 92 games. • The New York Yankees, who won 103 games. • The Kansas City Royals and Cleveland Indians, who both won 65 games.
Describing Location in a Distribution • Measuring Position: z-Scores • A z-score tells us how many standard deviations from the mean an observation falls, and in what direction. Definition: If x is an observation from a distribution that has known mean and standard deviation, the standardized value of x is: A standardized value is often called a z-score. Deja earned a score of 86 on her test. The class mean is 80 and the standard deviation is 6.07. What is her standardized score?
Describing Location in a Distribution • Using z-scores for Comparison We can use z-scores to compare the position of individuals in different distributions. Example Deja earned a score of 86 on her statistics test. The class mean was 80 and the standard deviation was 6.07. She earned a score of 82 on her chemistry test. The chemistry scores had a fairly symmetric distribution with a mean 76 and standard deviation of 4. On which test did Deja perform better relative to the rest of her class?
Describing Location in a Distribution • Using z-scores for Comparison We can use z-scores to compare the position of individuals in different distributions. • Now go back to Slide 6 and find and interpret the z-scores for the following teams: • The New York Yankees with 103 wins. • The New York Mets, with 70 wins.
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING Ms. Raskin’s Statistics class recorded their heights on a dotplot. • Find and interpret the z-score of the student who is 65 inches tall. • Find and interpret the z-score of the student who is 74 inches tall. • The student who is 76 inches tall is on the basketball team. His height translates to a z-score of -0.85 in the team’s height distribution. The standard deviation of basketball team members is 3.5 inches. What is the mean of the team members’ heights?
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING On one test your class achieved an average grade of 80 with a standard deviation of 8 points. • If you got a 96, what is your z-score? • Your best friend got a 76. What is her z-score? • What test grade has a z-score of +1.5? • Ms. Raskin calls home whenever a student’s z-score is worse than -2.0. What grade earns that phone call?
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING In the last chapter, we made boxplots for the number of calories and the fiber content in 23 kinds of Kellogg’s cereals. Now think about the sugar content. Those cereals average 7.6 grams of sugar per serving with a standard deviation of 4.5 grams. • Find the z-scores for the following cereals and describe what the z-score tells you about that cereal: • Frosted Flakes: 11g of sugar • Apple Jacks: 14g of sugar • Crispix: 3g of sugar • The z-score for Honey Smacks’ sugar content is a very high 3.87! How many grams of sugar are in one serving? • Product 19 is very low in sugar, with a z-score of -0.8. How many grams of sugar are in a serving of this cereal?
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING The calorie content for 23 varieties of Kellogg’s cereals averages 109 calories per serving with a standard deviation of 22.2 calories. The mean fiber content for these cereals is 2.7 grams per serving with a standard deviation of 3.2 grams. A serving of Kellogg’s All-Bran with Extra Fiber has a very low 50 calories and a very high 14 grams of fiber. Which is more remarkable – the calorie content or the fiber content? Explain.
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING Ms. Raskin has just announced that the lower of your two test scores will be dropped (JK!) You got a 90 on Test 1 and an 80 on Test 2. You’re all set to drop the 80 until she announces that she grades on a curve. She standardizes the scores in order to decide which is the lower one. If the mean on the first test was 88 with a standard deviation of 4 and the mean on the second was 75 with a standard deviation of 5, which one will be dropped? Is this fair?
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING The men’s combined skiing event in the winter Olympics consists of two races: a downhill and a slalom. Times for the two events are added together, and the skier with the lowest total time wins. In the 2006 Winter Olympics, the mean slalom time was 94.2714 seconds with a standard deviation of 1.8356 seconds. Ted Ligety of the United States, who won the gold medal with a combined time of 189.35 seconds, skied the slalom in 87.93 seconds and the downhill in 101.42 seconds. On which race did he do better compared with the competition?
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING A single-season home run record for Major League Baseball has been set just 3 times since Babe Ruth hit 60 home runs in 1927. In an absolute sense, Barry Bonds had the best performance of these four players, since he hit the most home runs in a single season. However, in a relative sense, this may not be true. Baseball historians suggest that hitting a home un has been easier in some eras than others. This is due to many factors, including quality of batters, quality of pitchers, hardness of the baseball, dimensions of the parks, and possible use of performance-enhancing drugs. To make a fair comparison, we should see how these performances rate relative to those of other hitters during the same year.
Describing Location in a Distribution • CHECK YOUR UNDERSTANDING Compute the standardized scores for each performance. Which player had the most outstanding performance relative to his peers?
Normal Distributions • That z-score jont: • A z-score gives us an indication of how unusual a value is because it tells us how far it is from the mean. • A data value that sits right at the mean has a z-score of 0. • A z-score of 1 means that the data value is 1 standard deviation above the mean. • A z-score of -1 means that the data value is 1 standard deviation below the mean.
Normal Distributions • What happens when a z-score is REALLY BIG? • How far from 0 does a z-score have to be to be interesting or unusual? There is no universal standard, but the larger (+/-) a z-score is, the more unusual it is. There is no universal standard for z-scores, but there is a model that shows up over and over in Statistics. This model is called the Normal Model. All models are wrong, but some are useful! -George Box, statistician
Normal Distributions • Normal Distributions • You may have heard of “bell-shaped curves”. Statisticians call them Normal models. Normal models are appropriate for distributions whose shapes are unimodel and roughly symmetric • All Normal curves are symmetric, single-peaked, and bell-shaped • A Specific Normal curve is described by giving its mean µ and standard deviation σ. Two Normal curves, showing the mean µ and standard deviation σ.
Normal Distributions • Normal Distributions • Definition: • A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. • The mean of a Normal distribution is the center of the symmetric Normal curve. • The standard deviation is the distance from the center to the change-of-curvature points on either side. • We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Normal distributions are good descriptions for some distributions of real data (test scores, characteristics of biological populations, etc.). Normal distributions are good approximations of the results of many kinds of chance outcomes. Many statistical inference procedures are based on Normal distributions.
The 68-95-99.7 Rule Normal Distributions Although there are many Normal curves, they all have properties in common. • Definition:The 68-95-99.7 Rule • In the Normal distribution with mean µ and standard deviation σ: • Approximately 68% of the observations fall within σ of µ. • Approximately 95% of the observations fall within 2σ of µ. • Approximately 99.7% of the observations fall within 3σ of µ.
Example Normal Distributions The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). • Sketch the Normal density curve for this distribution. • What percent of ITBS vocabulary scores are less than 3.74? • What percent of the scores are between 5.29 and 9.94?
Example Normal Distributions The distribution of heights of young women aged 18 – 24 is approximately Normal: N(64.5, 2.5). • Sketch the Normal density curve for this distribution. • What percent of young women have heights greater than 67 inches? • What percent of young women have heights between 67 and 72 inches?
CHECK YOUR UNDERSTANDING Normal Distributions • The average clean-up time for a crew of a medium-sized firm is 84.0 hours and the standard deviation is 6.8 hours. Assuming a Normal distribution, within what time interval will 95% of the clean-up times fall? • Over a long period of time, a farmer notes that the eggs produced by his chickens have a mean weight of 60g and a standard deviation of 15g. If eggs are classified by weight and “small” eggs are those having a weight of less than 45g, what percent of the farmer’s eggs will be classified as “small”? • Eggs are classified as “jumbo” if their weight is 90g or more. What percent of the farmer’s eggs will be “jumbo”?
CHECK YOUR UNDERSTANDING Normal Distributions • As a group, the Dutch are among the tallest people in the world. The average Dutch man is 184cm tall – just over 6 feet! The standard deviation of men’s heights is about 8cm. Assuming the distribution is approximately Normal, use the 68-95-99.7 rule to sketch a model for the heights of Dutch men. Label the axis clearly and indicate appropriate percentages. Based on this model, what percentage of all Dutch men should be over 2 meters (6’6”) tall?
CHECK YOUR UNDERSTANDING Normal Distributions • Let’s say it takes you 20 minutes, on average, to drive to school, with a standard deviation of 2 minutes. Suppose a Normal model is appropriate for the distributions of driving times. Based on the 68-95-99.7 Rule: • About how often will it take you between 18 and 22 minutes to get to school? • How often will you arrive at school in less than 22 minutes? • How often will it take you more than 24 minutes?
CHECK YOUR UNDERSTANDING Normal Distributions • In the 2006 Winter Olympics men’s combined event, Jean Baptiste Grange of France skied the slalom in 88.46 seconds – about 1 standard deviation faster than the mean. If a Normal model is useful in describing slalom times, about how many of the 35 skiers finishing the event would you expect skied the slalom faster than Jean-Baptiste?
CHECK YOUR UNDERSTANDING Normal Distributions • Scores on the Wechsler Adult Intelligence Scale for the 20-34 year old age group are normally distributed with a mean of 110 and standard deviation of 25. Use the 68-95-99.7 rule to determine what percent of people score… • Above 110 • Above 150 • Below 85 • Below 185
CHECK YOUR UNDERSTANDING Normal Distributions • EPA fuel economy estimates for automobiles tested recently predicted a mean of 24.8mpg and a standard deviation of 6.2 for highway driving. Assume a Normal model to determine… • The range of gas mileage for the central 68% of cars • The percent of autos getting more than 31 mpg • The percent of autos getting between 31 and 37.2mpg • Why you shouldn’t use these numbers to predict driving in-town gas mileage.
Normal Distributions • The Standard Normal Distribution • All Normal distributions are the same if we measure in units of size σ from the mean µ as center. We can standardize these data by changing to z-scores: z = (x - µ)/ σ. If the variable we standardize has a Normal distribution, then so does the new variable z. This new distribution is called the Standard Normal Distribution.
Normal Distributions • The Standard Normal Distribution • All Normal distributions are the same if we measure in units of size σ from the mean µ as center. Definition: The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1).
Normal Distributions • The Standard Normal Table Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. • Definition:The Standard Normal Table • Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Suppose we want to find the proportion of observations from the standard Normal distribution that are less than 0.81. We can use Table A: P(z < 0.81) = .7910
Example Normal Distributions • Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are between -1.25 and 0.81. Can you find the same proportion using a different approach? 1 - (0.1056+0.2090) = 1 – 0.3146 = 0.6854
Example Normal Distributions • Finding Areas Under the Standard Normal Curve Find the proportion of observations from the standard Normal distribution that are greater than -1.78.
The area to the right of z = 1.53 is 1 – 0.9370 = 0.0630 The table entry 0.9370 is for the area to the left of z = 1.53. Example Normal Distributions • Finding Areas Under the Standard Normal Curve Find the proportion of observations in a Normal distribution that are more than 1.53 standard deviations above the mean.
Area to the left of z = 1.79 is 0.9633 Area to the left of z = -0.58 is 0.2810 Area between z = -0.58 and z = 1.79 is 0.6823 – = Example Normal Distributions • Finding Areas Under the Standard Normal Curve Find the proportion of observations in a Normal distribution that are between -0.58 and 1.79.
CHECK YOUR UNDERSTANDING Normal Distributions • Remember those tall Dutch men? Their mean height was 184cm and the standard deviation was 8cm. Answer each of these questions by sketching a Normal model, shading the appropriate area, finding the z-scores, and using the table to determine the percentage. • What percent of Dutch men should be less than 190cm tall? • What percent of Dutch men should be between 170 and 180cm tall? • What fraction of Dutch men should be over 198cm tall?
Normal Distributions • Working Backward: Finding z-scores from Percentiles Sometimes we start with areas and need to find the corresponding z-score or even the original data value. What z-score represents the first quartile in a Normal Model?
Normal Distributions • Working Backward: Finding z-scores from Percentiles For each, sketch the standard Normal distribution, shade the area described, and find the z-score cutpoints The lowest 40% of the distribution The highest 30% of the distribution The highest 2% of the distribution The middle 30% of the distribution
Normal Distributions • Working Backward: Finding z-scores from Percentiles To Find a z-score from a percentile: Look for the percentile (as a decimal) in the middle of the z-table. Plug the z-score, the mean, and the standard deviation into the formula: z = (x – μ)/σ Solve for x
Normal Distributions • Going Back to those Dutch men Let’s think about the Normal model for the heights of Dutch men one more time. Remember that their mean height was 184cm and the standard deviation was 8cm. Answer each of these questions by sketching a Normal model, shading the appropriate area, finding the cutpoint z-score, and then determining the cutpoint height. How tall are the tallest 10% of all Dutch men? How tall are the shortest 20% of Dutch men? How tall are the middle 50% of Dutch men?
Normal Distributions • Normal Distribution Calculations How to Solve Problems Involving Normal Distributions • State: Express the problem in terms of the observed variable x. • Plan: Draw a picture of the distribution and shade the area of interest under the curve. • Do: Perform calculations. • Standardizex to restate the problem in terms of a standard Normal variable z. • Use Table A and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve. • Conclude: Write your conclusion in the context of the problem.
Normal Distributions • Normal Distribution Calculations When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger’s drives travel between 305 and 325 yards? Using Table A, we can find the area to the left of z=2.63 and the area to the left of z=0.13. 0.9957 – 0.5517 = 0.4440. About 44% of Tiger’s drives travel between 305 and 325 yards.
x = 120 z = 0.83 Normal Distributions • Normal Distribution Calculations In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour on his first serves: N(115, 6). About what proportion of his first serves would you expect to exceed 120mph? State: Let x = the speed of Nadal’s first serve. The variable x has a Normal distribution with μ = 115 and σ = 6. We want the proportion of first serves with x ≥ 120. Plan: Looking up a z-score of 0.83 shows us that the area less than z = -.83 is 0.7967. This means that the area to the right of z = 0.83 is 1 – 0.7967 = 0.2033. Conclude: About 20% of Nadal’s first serves will travel more than 120mph.
x = 100 z = -2.50 x = 110 z = -0.83 Normal Distributions • Normal Distribution Calculations In the 2008 Wimbledon tennis tournament, Rafael Nadal averaged 115 miles per hour on his first serves: N(115, 6). About what proportion of his first serves are between 100 and 110mph? State: Let x = the speed of Nadal’s first serve. The variable x has a Normal distribution with μ = 115 and σ = 6. We want the proportion of first serves with x ≥ 120. Looking up a z-score of -2.50 shows us that the area less than z = -2.50 is 0.0062. Looking up a z-score of -0.83 shows us that the area less than z = -0.83 is 0.2033. Thus, the area between z = -2.50 and z = 0.83 is 0.2033 – 0.0062 = 0.1971. Conclude: About 20% of Nadal’s first serves will travel between 100 and 110mph.
Normal Distributions • Normal Distribution Calculations: Cholesterol in Teenage Boys High levels of cholesterol in the blood increase the risk of heart disease. For 14-year-old boys, the distribution of blood cholesterol is approximately Normal: N(170, 30). What is the first quartile of the distribution of blood cholesterol?
CHECK YOUR UNDERSTANDING Normal Distributions High levels of cholesterol in the blood increase the risk of heart disease. For 14-year-old boys, the distribution of blood cholesterol is approximately Normal: N(170, 30). Cholesterol levels above 240mg/dL may require medical attention. What percent of 14-year-old boys have more than 240mg/dL of cholesterol? What percent of 14-year-old boys have blood cholesterol between 200 and 240 mg/dL?
Normal Distributions • Assessing Normality • The Normal distributions provide good models for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed. Consequently, we need a strategy for assessing Normality. • Plot the data. • Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. • Check whether the data follow the 68-95-99.7 rule. • Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to the 68%, 95%, and 99.7% targets for a Normal distribution.
Normal Distributions • Normal Probability Plots • Most software packages can construct Normal probability plots. These plots are constructed by plotting each observation in a data set against its corresponding percentile’s z-score. Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.