1 / 54

Understanding Bell-Shaped Curves in Data Analysis

Learn about the significance of bell-shaped curves in data analysis and how they are used to describe various populations. Explore thought-provoking questions and gain insights into standardized scores, percentiles, and the normal distribution.

Download Presentation

Understanding Bell-Shaped Curves in Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6 Bell Shaped Curves

  2. Thought Question 1: The heights of adult women in the United States follow, at least approximately,a bell-shaped curve. What do you think this means?

  3. Thought Question 2: What does it mean to say that a man’s weight is in the 30th percentilefor all adult males?

  4. Thought Question 3: A “standardized score” is simply the number of standard deviations an individual falls above or below the mean for the whole group. Male heights have a mean of 70 inches and a standard deviation of 3 inches. Female heights have a mean of 65 inches and a standard deviation of 2 ½ inches. Thus, a man who is 73 inches tall has a standardized score of 1. What is the standardized score corresponding to your own height?

  5. Thought Question 4: Data sets consisting of physical measurements (heights, weights, lengths of bones, and so on) for adults of the same species and sex tend to follow a similar pattern. The pattern is that most individuals are clumped around the average, with numbers decreasing the farther values are from the average in either direction. Describe what shape a histogram of such measurements would have.

  6. 8.1 Populations, Frequency Curves, and Proportions Move from pictures and shapes of a set of data to … Pictures and shapes for populations of measurements.

  7. Frequency Curves Smoothed-out histogram by connecting tops of rectangles with smooth curve. Frequency curve for population of British male heights. The measurements follow a normal distribution (or a bell-shaped or Gaussian curve). Note: Height of curve set so area under entire curve is 1.

  8. Frequency Curves Not all frequency curves are bell-shaped! Frequency curve for population of dollar amounts of car insurance damage claims. The measurements follow a right skewed distribution. Majority of claims were below $5,000, but there were occasionally a few extremely high claims.

  9. Proportions Recall: Total area under frequency curve = 1 for 100% Key:Proportion of population of measurements falling in a certain range = area under curve over that range. Mean British Height is 68.25 inches. Area to the right of the mean is 0.50. So about half of all British men are 68.25 inches or taller. Tables will provide other areas under normal curves.

  10. 8.2 The Pervasiveness of Normal Curves Many populations of measurements follow approximately a normal curve: • Physical measurements within a homogeneous population – heights of male adults. • Standard academic tests given to a large group – SAT scores.

  11. Normal Distribution Probability Probability is area under curve!

  12. Normal Distribution • The height of a normal density curve at any point x is given by is the mean is the standard deviation

  13. Importance of Normal Distribution • 1. Describes Many Random Processes or Continuous Phenomena • 2. Basis for Classical Statistical Inference

  14. Examples with approximate Normal distributions • Height • Weight • IQ scores • Standardized test scores • Body temperature • Repeated measurement of same quantity • These distributions which are like generalised relative frequency histograms can take many different shapes, some symmetrical some skewed. • There is one shape however that crops up all through the natural world and that is …

  15. The Normal Distribution is Symmetric. • There are many different Normal curves, some are fat some are thin. • Some are centred at 0 some at 1 some at 5 etc. • Each normal curve can be uniquely identified by two parameters. • The Mean and the Standard Deviation • Once you know the mean and the S.Deviation for a Normal curve then it is possible to draw the curve. • Normal curves are centred at the Mean. And the Standard Deviation describes how spread out they are.

  16. A Normal Frequency Curve for the Population of SAT scores

  17. The area under a Normal curve to the left of the mean is .5. This indicates that the probability that something which is normally distributed is less than its mean is .5. • The area under the curve to the left of any point A on the X axis represents the probability that a Normal variable is less than A.

  18. 8.3 Percentiles and Standardized Scores Your percentile = the percentage of the population that falls below you. • Finding percentiles for normal curves requires: • Your own value. • The mean for the population of values. • The standard deviation for the population. Then any bell curve can be standardizedso one table can be used to find percentiles.

  19. Percentiles • Example: Have you ever wondered what percentage of the population (of your gender) is taller than you are? • Your percentile in a population represents the position of your measurement in comparison with everyone else’s. • It gives the percentage of the population that fall below you. For example, if you are in the 98th percentile, it means that 98% of the population falls below you and only 2% is above you. • Your percentile value is easy to find if the population of values has an approximate bell shape. • Although there are an unlimited number of potential bell-shaped curves, each one can be completely determined once you know the mean and standard deviation of the population. • In addition, each curve can be “standardized” in a way such that the same table can be used to find percentiles for any of them.

  20. Infinite Number of Tables Normal distributions differ by mean & standard deviation. Each distribution would require its own table. That’s an infinite number!

  21. Standardize theNormal Distribution Normal Distribution Standardized Normal Distribution One table!

  22. The standardized score is often called the z-score. Once you know the z-score for an observed value, you can easily find the percentile corresponding to the observed value by using the table that gives the percentiles for a normal distribution with mean 0 and standard deviation 1. A normal curve with a mean of 0 and a standard deviation of 1 is called a standard normal curve. It is the curve that results when any normal curve is converted to standardized scores.

  23. Standardized Scores Standardized Score (standard score or z-score): observed value – mean standard deviation IQ scores have a normal distribution with a mean of 100 and a standard deviation of 16. • Suppose your IQ score was 116. • Standardized score = (116 – 100)/16 = +1 • Your IQ is 1 standard deviation above the mean. • Suppose your IQ score was 84. • Standardized score = (84 – 100)/16 = –1 • Your IQ is 1 standard deviation below the mean. A normal curve with mean = 0 and standard deviation = 1is called a standard normal curve.

  24. Table 8.1: Proportions and Percentiles for Standard Normal Scores

  25. Finding a Percentile from an observed value: • Find the standardized score = (observed value – mean)/s.d., where s.d. = standard deviation. Don’t forget to keep the plus or minus sign. • Look up the percentile in Table 8.1. • Suppose your IQ score was 116. • Standardized score = (116 – 100)/16 = +1 • Your IQ is 1 standard deviation above the mean. • From Table 8.1 you would be at the 84th percentile. • Your IQ would be higher than that of 84% of the population.

  26. Finding an Observed Value from a Percentile: • Look up the percentile in Table 8.1 and find the corresponding standardized score. • Compute observed value = mean +(standardized score)(s.d.), where s.d. = standard deviation. Example 1: Tragically Low IQ “Jury urges mercy for mother who killed baby. … The mother had an IQ lower than 98 percent of the population.” (Scotsman, March 8, 1994,p. 2) • Mother was in the 2nd percentile. • Table 8.1 gives her standardized score = –2.05, or 2.05 standard deviations below the mean of 100. • Her IQ = 100 + (–2.05)(16) = 100 – 32.8 = 67.2 or about 67.

  27. The Standard Normal Table: 8.1 • Table 8.1is a table of areas under the standard normal density curve. The table entry for each value z is the area under the curve to the left of z.

  28. The Standard Normal Table: Table A • Table 8.1 can be used to find the proportion of observations of a variable which fall to the left of a specific value zif the variable follows a normal distribution.

  29. Example 2: Calibrating Your GRE Score GRE Exams between 10/1/89 and 9/30/92 had mean verbal score of 497 and a standard deviation of 115. (ETS, 1993) • Suppose your score was 650and scores were bell-shaped. • Standardized score = (650 – 497)/115 = +1.33. • Table 8.1, z = 1.33 is between the 90th and 91st percentile. • Your score was higher than about 90% of the population.

  30. Example 3: Removing Moles Company Molegon: remove unwanted moles from gardens. Weights of moles are approximately normal with a mean of 150 grams and a standard deviation of 56 grams. Only moles between 68 and 211 grams can be legally caught. • Standardized score = (68 – 150)/56 = –1.46, and Standardized score = (211 – 150)/56 = +1.09. • Table 8.1: 86% weigh 211 or less; 7% weigh 68 or less. • About 86% – 7% = 79% are within the legal limits.

  31. Standardizing Example Normal Distribution Standardized Normal Distribution

  32. Some Examples Suppose it is know that verbal SAT scores are normally distributed with a mean of 500 and a standard deviation of 100. Find the proportion of the population of SAT scores are less than or equal to 600. First we need to find the standardized score: Z-score=(observed value-mean)/(standard deviation) =(600-500)/100 = +1 From Table 8.1 we see that a z-score of +1 is the 84th percentile and the proportion of population SAT scores that are less than or equal to 600 is 0.84.

  33. SAT SCORES

  34. Standardized Scores (Z-Scores)

  35. Estimate the proportion of population SAT scores that are greater than 400. First, we need to find the standardized score: z-score=(400-500)/100 = -1 From Table 8.1 we see that 16% of population values have a z-score less than or equal to -1 (or equivalently, 16% of population values have an observed score less than 400. However, we are interested in the proportion of the population with scores GREATER than 400. proportion ABOVE 400 = 1 - proportion BELOW 400 = 1 – 0.16 = 0.84

  36. Estimate the proportion of population SAT scores that are between 400 and 600. An observed value of 400 has a z-score of -1 and represents the 16th percentile (proportion below z = -1 is 0.16). An observed value of 600 has a z-score of +1 and represents the 84th percentile (proportion below z = +1 is 0.84). Let’s draw a picture….

  37. So the proportion with scores between 400 and 600 =Proportion below 600 – Proportion below 400 = 0.84 - 0.16 = 0.68

  38. Find an SAT score such that 70% of the population had SAT scores less than or equal to this number (i.e., estimate the 70th percentile of the population). First we need to find the z-score that corresponds to the 70th percentile. From Table 8.1 we see that this z-score is +0.52. Next we need to find the observed value (from the z-score): Observed value = mean + (z-score)*(standard deviation) = 500 + 0.52*100 = 552

  39. 8.4 z-Scores and Familiar Intervals Empirical Rule For any normal curve, approximately … • 68% of the values fall within 1 standard deviation of the mean in either direction • 95% of the values fall within 2 standard deviations of the mean in either direction • 99.7% of the values fall within 3 standard deviations of the mean in either direction A measurement would be an extreme outlier if it fell more than 3 s.d. above or below the mean.

  40. The 68-95-99.7 Rule

  41. The Empirical Rule Applet • http://www.stat.sc.edu/~west/applets/empiricalrule.html

  42. Heights of Adult Women Since adult women in U.S. have a mean height of 65 inches with a s.d. of 2.5 inches and heights are bell-shaped, approximately … • 68% of adult women are between 62.5 and 67.5 inches, • 95% of adult women are between 60 and 70 inches, • 99.7% of adult women are between 57.5 and 72.5 inches.

  43. For Those Who Like Formulas

  44. Example • In Tombstone, Arizona Territory people used Colt .45 revolvers. However people used different ammunition. • Wyatt Earp knew that his brothers and Doc Holliday were the only ones in the territory who used Colt .45s with Winchester ammunition. • The Earp brothers conducted tests on many different combinations of weapons and ammunition.They found that dataset of observations produced by the combination of Colt .45 with Winchester shells showed a Mean velocity of 936 feet/second and a Standard Deviation of 10 feet/second.

  45. The measurements were taken at a distance of 15 feet from the gun. • When Wyatt examined the body of a cowboy shot in the back in cold blood he concluded that he was shot at a distance of 15 feet and that the velocity of the bullet at impact was 1,000 feet/second. • The dastardly Ike Clanton claimed that this cowboy was shot by the Earp brothers or Doc Holliday. Was Wyatt able to clear his good name using the Empirical Rule?

  46. The distribution of this bullet velocity data should be approximately bell-shaped. This implies that the empirical rule should give a good estimation of the percentages of the data within each interval.

  47. This table quite clearly demonstrates that since the bullet velocity in the shooting was 1000 ft/sec and since this lies more than 6 Standard Deviations away from the mean the probability is extremely high that the Earps were not responsible for this shooting. • This is especially evident from looking at the column showing percentages from the empirical rule. • Practically 100% of bullet velocities should be between 896 and 976 ft/sec.

  48. ExampleP(3.8 X 5) Normal Distribution Standardized Normal Distribution .0478 Shaded area exaggerated

More Related