390 likes | 493 Views
Psych 5500/6500. Standard Deviations, Standard Scores, and Areas Under the Normal Curve. Fall, 2008. Standard Deviation.
E N D
Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008
Standard Deviation The standard deviation is the square root of the variance. It is a measure of variability, which means the greater the standard deviation of a group of scores, the more the scores differed from each other.
Example Sample A: Y = 6, 7, 8, 9 S=1.12 Sample B: Y = 5, 9, 15, 18 S=5.07 Sample B has more variability among its scores, and this is reflected by it having a larger standard deviation.
Caveat You still need to look at the data, however, to see if the standard deviation is a good measure of variability (remember it can be affected by one score far from the mean). • Sample A: Y= 5, 7, 9, 11, 13 S=2.83 • Sample B: Y = 9, 9, 9, 9, 20 S=4.4
Evaluating the Size of the Standard Deviation So far we have looked at comparing the standard deviations of two samples to see which sample had greater variability. What does knowing the variance and standard deviation of one sample tell us?
Example We have sampled some people from a population and measured the weight of each person in pounds. S² = 400 S = 20 What does that tell us? Is that a large or small amount of variability?
Variance and Standard Deviation Are Affected by Scale If we had measured the same people in ounces, rather than pounds, then S²=102,400 and S=320 (compared to S² = 400 and S = 20 when measured in pounds) Even though the people’s weights didn’t change, the use of a different scale greatly affected the variance and standard score. Conclusion: simply knowing the standard deviation of a sample doesn’t tell us whether the scores differed a lot without knowing about the measurement scale being used.
Is Variability ‘Bad’? The variability of your population is whatever it is. Theoretically there is nothing ‘bad’ about a variable having a large amount of variability. Pragmatically, however, the larger the variability the easier it is to get a nonrepresentative sample, and thus the harder it is to make firm conclusions about the population from which you sampled. This will be seen to influence the statistical analyses we perform.
What the Standard Deviation Can Tell You About Your Data Under some circumstances, knowing the value of the standard deviation can give you quite useful and specific knowledge about your data. This occurs when your data are ‘normally distributed’. A formal definition of ‘normally distributed’ is given in the next slide, a less exact definition is to say that a graph of normally distributed data gives us a bell-shaped curve.
The Normal Curve Note that all of the elements in the formula for computing Y are constants except for ‘σ’ and ‘μ’. Thus knowing the values of those two elements completely determines the curve.
Why Focus on the Normal Curve? Many statistical procedures are based upon the assumption that our data will be normally distributed. We will be examining the reasonableness of this assumption, the consequences of it not being true, and what to do if it isn’t true, as the semester progresses.
Assuming ‘Normality’ • There is an abundance of empirical data indicating that the distribution of scores is often approximately normal. • It is possible to transform nonnormal data into a nearly normal distribution. • The Central limit theorem leads us to expect normal distributions in many cases.
Central Limit Theorem If the scores of interest are the result of the sum (or mean) of several independent nonnormal measures, then the distribution of the scores will approximate the normal distribution. The greater the number of measures that go into that score, the more likely the score itself will be normally distributed.
Population of Outcomes of Rolling One Six-Sided Die Each outcome (1 – 6) has an equal chance of occurring. This rectangular shaped distribution is not ‘normally distributed’.
Population of Sums of Rolling Two Six-Sided Dice Note starting to be more normally distributed
Population of Means of Rolling Two Six-Sided Dice Of course, same shape as sum.
Standard Deviations and the Normal Curve For all normal curves: (34.1 x 2) = 68.2% of the scores fall within 1 std dev of the mean. (34.1+13.6) x 2 = 95.4% within 2 std devs of the mean. (34.1+13.6+2.1) x 2 = 99.6% within 3 std devs of the mean.* * We would get 99.74 if we hadn’t rounded 34.1 and 13.6 to one decimal place.
You sample from a population that is normally distributed and that has a mean of 80 and a standard deviation of 5. What does that tell you? Approximately 68% of the scores fall between 75 and 85 (805). Approximately 95% of the scores fall between 70 and 90 (8010). Over 99% of the scores fall between 65 and 95 (8015).
When the data are normally distributed Approximately 68% of the scores fall within one standard deviation of the mean. Approximately 95% fall within two standard deviations of the mean. Over 99% fall within three standard deviations of the mean. The further the data are from being ‘normal’ the less accurate these percentages are, but they often give you at least some idea of the spread of the scores.
‘Eyeballing’ Standard Deviations Now for something that will impress your friends and make you the life of the party. You should be able to look at a curve and estimate its standard deviation.
A population is given below, to estimate its standard deviation divide up the horizontal axis into six equally wide areas (three on each side of the mean).
It looks like the standard deviation must be around 18 (any guess between 15 and 25 would not be too bad).
Segue to Standard Scores Two normal curves, note that when the distribution widens out (i.e. has greater variance), that the standard deviations spread out to match. In both cases, for example, 34.1% of the scores fall between the mean and one standard deviation above the mean.
Look at a score of 140 on the top curve, it is more than 2 standard deviations above the mean, and very few scores fall above it. Now look at a score of 140 on the bottom curve, it is a little more than 1 standard deviation above the mean, and while still impressive it is not as unusually high (compared to the other scores) as in the top curve.
Standard Scores Standard Scores: tell us how many standard deviations above or below the mean a particular ‘raw’ score falls. We will use ‘z’ to stand for a standard score. During the semester we will be looking at a variety of apparently different formulas, but they all have the same basic idea:
Example 1 Say you want to find the standard score for a raw score of 90 in the distribution above, which has a mean of μ=70 and a standard deviation of σ=18. First estimate z by looking at the curve, a score of 90 is a little more than 1 standard deviation above the mean. Now compute it: z = (Y- μ)/ σ so z =(90-70)/18 = 1.11
Example 2 Say you want to find the standard score for a raw score of 35 in the distribution above, which has a mean of μ=70 and a standard deviation of σ=18. First estimate z by looking at the curve, a score of 35 is not quite 2 standard deviations below the mean. Now compute it: z = (Y- μ)/ σ so z = (35-70)/18 = -1.94
Standard Scores (cont.) Standard scores are useful in that they tell us how a score compares to other scores in its group. Look again at the normal curve, a z score of –1 (one standard deviation below the mean) is fairly low compared to the other scores, but a z score of –3 would be extremely low.
Standard scores are ‘standard’ in that they allow us to compare scores from different groups (i.e. compare apples and oranges). Say Timmy comes home with a ‘raw’ score of Y=120 on a math test, and a ‘raw’ score of Y=60 on an English test. In which test did he do better? There is not enough information. But if we knew that his standard score in the math test was z=0.8 we would know that his score was above the mean but that quite a few students did better. And if we knew that his standard score on the English test was z=2.5 we would knew that he did REALLY well on that test. So now we know he did better in the English test than in the math test.
Finding Areas Under the Normal Curve To find what proportion of scores fall in certain areas of the normal curve you can use either the Normal Distribution Table provided in the Critical Values Tables area of the Course Materials page or the Normal Distribution Tool in the Oakley Stat Tools link on the Course Materials page. The tool is easier but the table is what will be available in an exam.
Example 1 Example 1: What proportion and percent of the scores fall between the mean (z=0) and z=1.33? (i.e. 0 ≤ z ≤ 1.33). Look up z=1.33 in the table, then go over to Column A to get your answer.
Example 2 Example 2: What proportion and percent of the scores fall at or above z=1.33? (i.e. 1.33 ≤ z). Note: you could also have computed this by taking .5000-.4082 (see previous slide)
Negative z values Example 3: The table doesn't bother to give negative values of z, as the curve is symmetrical, the area (z ≤ -1.33) is the same as on the other side of the graph (1.33 ≤ z). In other words, ignore the negative sign and just look up the value of z on the chart.
Example 4 Example 4: Now that we are at this point, I will ask what proportion and what percent of the curve falls between -2.27 ≤ z ≤ 2.27 (i.e. falls within 2.27 standard deviations of the mean in either direction). Answer: .4884 + .4884 = .9768 or 97.68%
Example 5 Example 5: Now we will move to the following type of question: You sample from a population that is normally distributed, has a mean of 50 and a standard deviation of 16. Question: what proportion and percent of the scores will fall within ±8 of the mean (i.e. 42 ≤ Y ≤ 58)? Step 1: draw the curve, then shade in the area in question.
Example 5 (cont.) Step 2: the only way to proceed is to change scores into z scores so that you can use the table Find the standard score for Y=58 Find the standard score for Y=42
Example 5 (cont.) Now that we have changed the question from 42 Y 58 to –0.5 z 0.5 we can answer the question. Answer: .1915 + .1915 = .3830 or 38.30%