170 likes | 294 Views
Review: what is a distribution ?. Arrangement of all cases in a sample or population along a single variable, according to their score or value. Dispersion. How scores or values arrange themselves around the mean If most scores cluster about the mean the shape of the distribution is peaked
E N D
Review: what is a distribution? • Arrangement of all cases in a sample or population along a single variable, according to their score or value
Dispersion • How scores or values arrange themselves around the mean • If most scores cluster about the mean the shape of the distribution is peaked • As scores become more dispersed the distribution’s shape flattens • In social science research many variables of interest tend to disperse “normally” or near-normally • “Normal”? What’s that?
Normal distributions • In large random samples and in populations, the scores of many variables of interest to social scientists are “normally” (or near-normally) distributed. • Normal distributions have certain characteristics • They are unimodal and symmetrical: shapes on both sides of the mean are identical • 68.26 percent of the area “under” the curve – meaning 68.26 percent of the cases – falls within one “standard deviation” (+/- 1 ) from the mean • The fact that a distribution is “normal” does NOT imply that the mean is of any particular value. All it implies is that scores distribute themselves around the mean “normally”. • Means depend on the data. In this distribution the mean could be any value. • By definition, the standard deviation score that corresponds with the mean of a normal distribution is zero. Mean (whatever it is) Standard deviation (always 0 at the mean)
Measures of dispersion(distances between scores) • Average deviation (x - ) ----------- n • Average distance between the mean and the values (scores) for each case • Uses absolute distances (no + or -) • Affected by extreme scores • Variance (s2) (x - )2 ----------- n use n-1 for small samples • Standard deviation (s) (x - )2 ----------- n use n-1 for small samples • Square root of the variance • Expresses dispersion in units of equal size for that particular distribution • Less affected by extreme scores
13 officers scored on numbersof tickets written in one week Officer A: 1 ticket Officers B – I: 2 tickets each Officer J: 1 ticket Officers K & L: 2 tickets each Officer M: 1 ticket How does “standarddeviation” work? B D F H K A C E G I J L M Frequency 2.13 4.46 6.79 -1 SD mean +1 SD Mean = 4.46 SD = 2.33 In a normal distribution 66 percent of the cases fall within 1 SD of the mean 13 X .66 = 8.58 = 9 cases Here 7 cases fall within 1 SD of the mean. So, the cases are more dispersed than in a normal distribution. Most officers write different Numbers of tickets. Scores do not “cluster.” Number of tickets
13 officers scored on numbersof tickets written in one week Officer A - C: 1 ticket each Officers D – I: 3 tickets each Officers J & K: 2 tickets each Officers L & M: 1 ticket each Frequency D G E H J A B C F I K L M Mean = 4.69 SD = 2.1 2.59 4.69 6.79 -1 SD mean +1 SD In a normal distribution 66 percent of the cases fall within 1 SD of the mean 13 X .66 = 8.58 = 9 cases Here 9 cases do fall within 1 SD of the mean So, the cases “cluster” around the mean. Most officers wrote close to the same number of tickets. Number of tickets
Variabilityexercise Sample 1 (n=10) Officer Score Mean Diff. Sq. 1 3 2.9 .1 .01 2 3 2.9 .1 .01 3 3 2.9 .1 .01 4 3 2.9 .1 .01 5 3 2.9 .1 .01 6 3 2.9 .1 .01 7 3 2.9 .1 .01 8 1 2.9 -1.9 3.61 9 2 2.9 -.9 .81 10 5 2.9 2.1 4.41 ____________________________________________________ Sum 8.90 Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Random sample of patrol officers,each scored 1-5 on a cynicism scale This is not an acceptable graph – it’s only to illustrate dispersion
Sample 2 (n=10) Officer Score Mean Diff. Sq.1 2 ___ ___ ___ 2 1 ___ ___ ___ 3 1 ___ ___ ___ 4 2 ___ ___ ___ 5 3 ___ ___ ___ 6 3 ___ ___ ___ 7 3 ___ ___ ___ 8 3 ___ ___ ___ 9 4 ___ ___ ___ 10 2 ___ ___ ___ Sum ____ Variance s2 ____ Standard deviation s ____ Another random sample of patrol officers,each scored 1-5 on a cynicism scale Compute ...
Two random samples of patrol officers, each scored 1-5 on a cynicism scale Sample 1 (n=10) Officer Score Mean Diff. Sq. 1 3 2.9 .1 .01 2 3 2.9 .1 .01 3 3 2.9 .1 .01 4 3 2.9 .1 .01 5 3 2.9 .1 .01 6 3 2.9 .1 .01 7 3 2.9 .1 .01 8 1 2.9 -1.9 3.61 9 2 2.9 -.9 .81 10 5 2.9 2.1 4.41 Sum 8.90 Variance (sum of squares / n-1) s2 .99 Standard deviation (sq. root of variance) s .99 Sample 2 (n=10) Officer Score Mean Diff. Sq. 1 2 2.4 -.4 .16 2 1 2.4 -1.4 1.96 3 1 2.4 -1.4 1.96 4 2 2.4 -.4 .16 5 3 2.4 .6 .36 6 3 2.4 .6 .36 7 3 2.4 .6 .36 8 3 2.4 .6 .36 9 4 2.4 1.6 2.56 10 2 2.4 -.4 .16 Sum 8.40 Variance (sum of squares / n-1) s2 .93 Standard deviation (sq. root of variance) s .97 These are not acceptable graphs – they’re only to illustrate dispersion
z-score (a “standard” score) • If the distribution of a variable (e.g., number of arrests) is approximately normal, we can estimate where any score would fall in relation to the mean. • We first convert the sample score into a z-score using the sample standard deviation z-scores -3 -2 -1 0 +1 +2 +3
We then look up the z-score in a table. It gives the proportion of cases in the distribution… • Between a case and the mean • Beyond the case, away from the mean (left for negative z’s, right for positive z’s) • Z-scores can be used to identify the percentile bracket into which a case falls (e.g., bottom ten percent) • Since z-scores are standardized like percentages, they can be used to compare samples • Thez-table indicates the proportion of the area under the curve (the proportion of scores) between the mean and any z score, and the proportion of the area beyond that score (to the left or right) • In a normal distribution 95 percent of all z-scores falls between +/- 1.96 • In a normal distribution 5 present of all z-scores fall beyond +/- 1.96 Rare/unusual cases Proportion of area “under the curve” where cases lie .025 .475 .475 .025 100 percent of scores 95 percent of scores 2½ pct. 2½ pct. -1.96 +1.96
Variability exercise Sample of twenty officers drawn fromthe Anywhere police department,each measured for number of arrests Frequency 1 2 3 4 5 6 0 1 2 3 4 5 6 Arrests Number of arrests is presumably normally distributed in the population of officers, meaning the whole police department. Why? Variable: number of arrests Unit of analysis: officers Case: one officer
Assignment • Compute the sample standard deviation • Obtain the z-score for 0, 1, 2, 3, 4, 5 and 6 arrests(x -x)z = --------s NOTE: There are only seven values: 0, 1, 2, 3, 4, 5, 6. Only need to compute their statistics once.
In a perfectly normal distribution Jay would be in the bottom two percent! In a perfectly normal distribution Dudley would be in the top two percent! Hi, Tony: Read the slide from left to right. No. of officers 1 2 3 4 5 6 z-score -2 -1 0 +1 +2 No. of arrests 0 1 2 3 4 5 6
Exam information • You must bring a regular, non-scientific calculator with no functions beyond a square root key and a z-table. • You need to understand the concept of a distribution. • You will be given data and asked to create graph(s) depicting the distribution of a single variable. • You will compute basic statistics, including mean, median, mode, standard deviation and z-score. All computations must be shown on the answer sheet. • You will be given the formulas for variance (s2) and z. You must use and display the procedure described in the slides and practiced in class for manually calculating variance (s2) and standard deviation (s). • You will use the z-table to calculate where cases from a given sample would fall in a normal distribution. • This is a relatively brief exam. You will have one hour to complete it. We will then take a break and move on to the next topic.