140 likes | 160 Views
Learn why knowing variation is important when analyzing a dataset, interpret Normal curves in terms of center and variation, and compare values using z-scores. Explore examples of interpreting scores and analyzing gestational time data.
E N D
Learning Objectives By the end of this lecture, you should be able to: • Discuss with an example why it is important to know the variation when analyzing a dataset • Interpret a series of Normal curves relative to each other in terms of their center and variation • Be able to compare values from different datasets by comparing their z-scores
Thoughts on variation continued • Let’s take a moment to think about spread (again)… • Suppose you score 12 out of 15 on a test. • Great score? • Good score? • Average score? • Poor score? • Terrible score? • Answer: You can’t tell! I hope you’d agree that you’d at least need the mean in order to interpret how good a score this was. • Okay then, so suppose I tell you that the mean was 11 / 15. Now answer the same question: Is 12/15 a Great score, Good score, Fair score, Poor score, Terrible score? • Answer: You STILL can’t tell! While you could say that is somewhat better than average, you really have no way of knowing if it is approximately average, good, or great.
Thoughts on variation continued • Recall the question: Suppose we are told that the mean was 11 / 15. If someone scores 12/15 is this a: • Great score? • Good score? • Average score? • Poor score? • Terrible score? Discussion: What’s missing from this interpretation is a measure of spread. Suppose I told you that of the 500 students who took this test, the vast majority scored between 9.5 and 10.5. In this case, you’d suspect that a score of 12 was, in fact, quite good, but you couldn’t put a number on it. • KEY POINT: In order to properly interpret any score (of a Normal distribution), we simply can not ignore the standard deviation!!! • Suppose the standard deviation was 0.5. In this case, a score of 12 is two standard deviations above the mean. This would be a score at about the 98th percentile – which is a great result. • Suppose the standard deviation was 2. In that case, your z-score is +0.5 and you are in the 70th percentile which is good, but not fantastic. • In other words, without knowing anything about the variation, we simply do not know the story!
What’s different? What’s the same? In this group, the means are the same (m = 15) but the standard deviations are different (s = 2, 4, and 6). In this group, means are different (m = 10, 15, and 20) while the standard deviations are the same (s = 3)
Another extremely useful thing about working with normally distributed data is that we can compare apples and oranges! That is, because we can convert any observation into a z-score, we can then answer questions to compare seemingly non-comparable distributions.
SAT vs ACT • Question: Suppose that student A scores 1140 on their SAT, and student B scores 18.2 on their ACT. You are an admissions counselor and you need to make a decision based exclusively on their test score. Can you use this data to decide? • Answer: If you can convert these numbers to their corresponding z-scores, then absolutely! To do so, you would, of course, need to know the mean and standard deviation of the two exams. This information is routinely provided by the testing services. • E.g. If student A had a z-score of +1, that means he was in the 84th percentile for the SAT. If student B had a z-score of +1.3, that means that he was in the 90th percentile. So even though they took completely different exams, you do have a way of comparing them!
Example: Gestation time in malnourished mothers A study was done in which the gestation time of mothers in a poor neighborhood was measured. While there were free prenatal vitamins available, there was a great deal of misinformation about proper prenatal nutrition. The gestation time of this group can be seen on the light-blue curve below. Over the next couple of years, a public health project was implemented at local health-care institutions in which women were also provided with nutritional counseling and healthier food. The results of a study after the nutritional program was implemented are summarized on the orange graph below. Try to interpret the results in your own words…. • 266 s 15 • 250 s 20
Example: Gestation time in malnourished mothers • Try to interpret the results in your own words…. • The mean gestational time improved from about 250 to 266. • In addition to the mean improving, there were more people who reached the mean (the peak of the orange curve is higher than the peak of the blue curve). • There was more consistency in the “better nutrition” group: the spread of the orange distribution is narrower. (While you can simply eyeball it, and you can also quantify it by the standard deviation). Don’t feel bad if you didn’t automatically ‘get’ all these facts. That’s why we do examples here! Your goal should be to begin making these kinds of interpretations on your own. • 266 s 15 • 250 s 20
Example: Gestation time in malnourished mothers A commonly accepted number for a minimum gestational period (ideally) is about 240 days or longer. How might we quantify the improvement that occurred between the group that did not receive counseling and the group that did? Instead of waiting for me to answer, try to come up with it on your own. I.e. STOP and THINK about it for a moment… Answer: The best way would be to look at the percentage of women who reached the target of 240 days in each group. • 266 s 15 • 250 s 20
In the group without nutritional counseling (vitamins only), what percent of mothers failed to carry their babies at least 240 days? m=250, s=20, x=240 Vitamins Only: Vitamins only: About 31% of women failed to reach the target length of 240 days.
Nutritional counseling and better food m=266, s=15, x=240 Nutritional assistance program: Only about 4% of women failed to carry their babies 240 days! Conclusion: Compared to vitamin supplements alone, vitamins and better food resulted in a much smaller percentage of women with pregnancy terms below 8 months (4% vs. 31%).
Going in the other direction… Stats teachers love this!!... We may instead need to determine the observed range of values that correspond to a given proportion/ area under the curve. As an example, let’s calculate the z-score corresponding to an area of 1.25% To do this calculation, we need to go backward. That is, we start with the normal table: • we first find the desired area/ proportion in the body of the table, • we then read the corresponding z-value from the left column and top row. • Now that we know the z-score, we can calculate the observed value, ‘x’. For an area to the left of 1.25 % (0.0125), the z-value is -2.24
Example: How long are the longest 75% of pregnancies when mothers in the neighborhood are entered in the “better food” program? Answer: This is another case where we start with an area, and need to calculate an observation value, ‘x’. m=266, s=15, upper area 75% upper 75% ? Conclusion: The 75% longest pregnancies in this group are about 256 days or longer.