420 likes | 563 Views
Name that tune. Song title? Performer(s)?. Descriptive Statistics. “Finding New Information” 3/23/2011. Standard Deviation. σ = SQRT( Σ (X - µ) 2 /N) (Does that give you a headache?).
E N D
Name that tune. Song title? Performer(s)?
Descriptive Statistics “Finding New Information” 3/23/2011
Standard Deviation σ = SQRT(Σ(X - µ)2/N) (Does that give you a headache?)
Statistics: The only science that enables different experts using the same figures to draw different conclusions. • Evan Esar (1899 - 1995), US humorist
USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. • David Letterman
The last 2 lectures . . . • . . . we’ve been talking about the scientific method. • When you conduct an experiment, at some point you’ll have some data. • “Statistics” is the field of study that addresses how we deal with, manipulate, interpret those data.
How to talk about a set of numbers • We can list ‘em. • Can get WAY unwieldy. • Plus hard to make any sense out of them. • First step – put ‘em in order. • Second step – • Graph ‘em, and/or • Calculate percentiles/deciles
# of pets ever owned 13 2 1 4 0 1 3 0 5 1 Put ‘em in order 0 0 1 1 1 2 3 4 5 13 Frequency Distributions -Histograms
Raw Scores (in order) 0 0 1 1 1 2 3 4 5 13 Raw Score Freq Cumu Freq 0 2 2 1 3 5 2 1 6 3 1 7 4 1 8 5 1 9 13 1 10 Freq Dist
Percentiles • LOCATION of 25th percentile: • X.25 = (N+1) .25 • LOCATION of 50th percentile: • X.50 = (N+1) .50 • LOCATION of 75th percentile: • X.75 = (N+1) .75 • Example: If we had 10 scores, • the 25th percentile would be the (11).25=2.75th score or part way (half way!) between the 2nd and 3rd scores. • The 50th percentile would be the (11).50=5.5th score, or half way between the 5th and 6th scores.
Note . . . • With an odd number of scores, the 50th percentile will be an actual score: • Raw Scores (in order) • 0 • 0 • 1 • 1 • 1 • 2 • 3 • 4 • 5 • 13 • 100 • 50th percentile = (N+1).50 = (12).5 = 6th score = 2.
Earlier . . . • We learned about frequency distributions. • I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. • There’s another, even shorter-hand way.
Measures of Central Tendency • Mode • Most frequent score (or scores – a distribution can have multiple modes) • Median • “Middle score” • 50th percentile • Mean - µ (“mu”) • “Arithmetic average” • ΣX/N
Let’s calculate some “averages” Here’s a distribution of scores Measures of Central Tendency Mode? Median? Mean? • 2 • 2 • 5
Let’s calculate some “averages” Here’s a distribution of scores Measures of Central Tendency Mode? Median? Mean? • 0 • 0 • 0 • 1 • 1 • 10
A quiz about averages 1 – If one score in a distribution changes, will the mode change? __Yes __No __Maybe 2 – How about the median? __Yes __No __Maybe 3 – How about the mean? __Yes __No __Maybe 4 – True or false: In a normal distribution (bell curve), the mode, median, and mean are all the same? __True __False
More quiz questions about measures of central tendency 5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily a bell curve? __True __False 6 – I have a distribution of 10 scores. There was an error, and really the highest score is 5 points HIGHER than previously thought. a) What does this do to the mode? __ Increases it __Decreases it __Nothing __Can’t tell b) What does this do to the median? __ Increases it __Decreases it __Nothing __Can’t tell c) What does this do to the mean? __ Increases it __Decreases it __Nothing __Can’t tell 7 – Which of the following must be an actual score from the distribution? a) Mean b) Median c) Mode d) None of the above
OK, so which do we use? • Means allow further arithmetic/statistical manipulation. But . . . • It depends on: • The type of data • Can’t use means with nominal or ordinal scale data (more on the Monday) • With nominal data, must use mode • The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). • The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).
Have sidled up to SHAPES of distributions • Symmetrical • Skewed – positive and negative • Flat
Why . . . • . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? • “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).
Didja hear the one about . . . • the Aggies who were on a march and came to a river? The Aggie captain asked the farmer how deep the river was.” • “Oh, it averages two feet deep.” • All the Aggies drowned.
Note . . . • We started with a bunch of specific scores. • We put them in order. • We drew their distribution. • Now we can report their central tendency. • So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. • Why isn’t a Measure of Central Tendency, alone, satisfactory? • Note MANY distributions could have a particular central tendency! • If we went back to ALL the specifics, we’d be back at square one.
Measures of Dispersion (or Spread) • Range • Semi-interquartile range • Standard deviation • σ (sigma)
Range • Highest score minus the lowest score. • Like the mode . . . • Easy to calculate • Potentially misleading • Doesn’t take EVERY score into account. • What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. • ‘Cause MANY different distributions of scores can have the same central tendency! • “Standard Deviation” -- σ = SQRT(Σ(X - µ)2/N)
Let’s do a short example • What if I asked four undergraduates how many cars they’ve owned in their lives and I got the following answers: 1 1 1 1 • There would be NO variance. σ = 0. • But what if the answers were 0 0 1 3 What’s the mode? Median? Mean? • Go with mean. • So, how much do the actual scores deviate from the mean?
So . . . • Add up all the deviations and we should have a feel for how dispersed, how spread, how deviant, our distribution is. • Let’s calculate the Standard Deviation. • As always, start inside the parentheses. • Σ(X - µ)
Damn! • OK, let’s try it on another set of numbers.
Damn! (cont’d.) • OK, let’s try it on a smaller set of numbers.
OK . . . • . . . so mathematicians at this point do one of two things. • Take the absolute value or square ‘em. • We square ‘em. Σ(X - µ)2
Standard Deviation (cont’d.) • Then take the average of the squared deviations. Σ(X - µ)2/N • Remember, dividing by N was the way we took the average of the original scores. • 10/4 = 2.5. • But this number is so BIG!
OK . . . • . . . take the square root (to make up for squaring the deviations earlier). • σ = SQRT(Σ(X - µ)2/N) • SQRT(2.5) = 1.58 • Now this doesn’t give you a headache, right? • I said “right”?
We need . . . • A measure of spread that is NOT sensitive to every little score, just as median is not. • SIQR: Semi-interquartile range. • (Q3 – Q1)/2
Practice Problems • I’ll send you some, tonight.
http://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.htmlhttp://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.html • Click on Statistics Primer.
References • Hinton, P. R. Statistics explained. • Shaughnessy, Zechmeister, and Zechmeister. Experimental methods in psychology.