400 likes | 530 Views
Name that tune. . Song title? Performer(s)?. Descriptive Statistics. “Finding New Information” 4/5/2010. Standard Deviation. σ = SQRT( Σ (X - µ) 2 /N) (Does that give you a headache?).
E N D
Name that tune. Song title? Performer(s)?
Descriptive Statistics “Finding New Information” 4/5/2010
Standard Deviation σ = SQRT(Σ(X - µ)2/N) (Does that give you a headache?)
USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. • David Letterman
Statistics: The only science that enables different experts using the same figures to draw different conclusions. • Evan Esar (1899 - 1995), US humorist
Scales • The data we collect can be represented on one of FOUR types of scales: • Nominal • Ordinal • Interval • Ratio • “Scale” in the sense that an individual score is placed at some point along a continuum.
Nominal Scale • Describe something by giving it a name. (Name – Nominal. Get it?) • Mutually exclusive categories. • For example: • Gender: 1 = Female, 2 = Male • Marital status: 1 = single, 2 = married, 3 = divorced, 4 = widowed • Make of car: 1 = Ford, 2 = Chevy . . . • The numbers are just names.
Ordinal Scale • An ordered set of objects. • But no implication about the relative SIZE of the steps. • Example: • The 50 states in order of population: • 1 = California • 2 = Texas • 3 = New York • . . . 50 = Wyoming
Interval Scale • Ordered, like an ordinal scale. • Plus there are equal intervals between each pair of scores. • With Interval data, we can calculate means (averages). • However, the zero point is arbitrary. • Examples: • Temperature in Fahrenheit or Centigrade. • IQ scores
Ratio Scale • Interval scale, plus an absolute zero. • Sample: • Distance, weight, height, time (but not years – e.g., the year 2002 isn’t “twice” 1001).
Scales (cont’d.) It’s possible to measure the same attribute on different scales. Say, for instance, your midterm test. I could: • Give you a “1” if you don’t finish, and a “2” if you finish. • “1” for highest grade in class, “2” for second highest grade, . . . . • “1” for first quarter of the class, “2” for second quarter of the class,” . . . • Raw test score (100, 99, . . . .). • (NOTE: A score of 100 doesn’t mean the person “knows” twice as much as a person who scores 50, he/she just gets twice the score.)
Earlier . . . • We learned about frequency distributions. • I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. • There’s another, even shorter-hand way.
Measures of Central Tendency • Mode • Most frequent score (or scores – a distribution can have multiple modes) • Median • “Middle score” • 50th percentile • Mean - µ (“mu”) • “Arithmetic average” • ΣX/N
More quiz questions about measures of central tendency 4 – True or false: In a normal distribution (bell curve), the mode, median, and mean are all the same? __True __False 5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily a bell curve? __True __False 6 – I have a distribution of 10 scores. There was an error, and really the highest score is 5 points HIGHER than previously thought. a) What does this do to the mode? __ Increases it __Decreases it __Nothing __Can’t tell b) What does this do to the median? __ Increases it __Decreases it __Nothing __Can’t tell c) What does this do to the mean? __ Increases it __Decreases it __Nothing __Can’t tell 7 – Which of the following must be an actual score from the distribution? a) Mean b) Median c) Mode d) None of the above
OK, so which do we use? • Means allow further arithmetic/statistical manipulation. But . . . • It depends on: • The type of scale of your data • Can’t use means with nominal or ordinal scale data • With nominal data, must use mode • The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). • The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).
Have sidled up to SHAPES of distributions • Symmetrical • Skewed – positive and negative • Flat
Why . . . • . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? • “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).
Didja hear the one about . . . • the Aggies who were on a march and came to a river? The Aggie captain asked the farmer how deep the river was.” • “Oh, it averages two feet deep.” • All the Aggies drowned.
Note . . . • We started with a bunch of specific scores. • We put them in order. • We drew their distribution. • Now we can report their central tendency. • So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. • Note MANY distributions could have a particular central tendency! • If we went back to ALL the specifics, we’d be back at square one.
Measures of Dispersion • Range • Semi-interquartile range • Standard deviation • σ (sigma)
Range • Highest score minus the lowest score. • Like the mode . . . • Easy to calculate • Potentially misleading • Doesn’t take EVERY score into account. • What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. • ‘Cause MANY different distributions of scores can have the same central tendency! • “Standard Deviation” -- σ = SQRT(Σ(X - µ)2/N)
Let’s do a short example • What if I asked four undergraduates how many cars they’ve owned in their lives and I got the following answers: 1 1 1 1 • There would be NO variance. σ = 0. • But what if the answers were 0 0 1 3 What’s the mode? Median? Mean? • Go with mean. • So, how much do the actual scores deviate from the mean?
So . . . • Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is. • Let’s calculate the Standard Deviation. • As always, start inside the parentheses. • Σ(X - µ)
Damn! • OK, let’s try it on another set of numbers.
Damn! (cont’d.) • OK, let’s try it on a smaller set of numbers.
OK . . . • . . . so mathematicians at this point do one of two things. • Take the absolute value or square ‘em. • We square ‘em. Σ(X - µ)2
Standard Deviation (cont’d.) • Then take the average of the squared deviations. Σ(X - µ)2/N • Remember, dividing by N was the way we took the average of the original scores. • 10/4 = 2.5. • But this number is so BIG!
OK . . . • . . . take the square root (to make up for squaring the deviations earlier). • σ = SQRT(Σ(X - µ)2/N) • SQRT(2.5) = 1.58 • Now this doesn’t give you a headache, right? • I said “right”?
We need . . . • A measure of spread that is NOT sensitive to every little score, just as median is not. • SIQR: Semi-interquartile range. • (Q3 – Q1)/2
Practice Problems • I’ll send you some, tonight.
http://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.htmlhttp://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.html • Click on Statistics Primer.
References • Hinton, P. R. Statistics explained. • Shaughnessy, Zechmeister, and Zechmeister. Experimental methods in psychology.