350 likes | 441 Views
INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2. Standard Deviation. σ = SQRT( Σ (X - µ) 2 /N) (Does that give you a headache?). USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. David Letterman.
E N D
INF 397CIntroduction to Research in Information StudiesFall, 2005Day 2
Standard Deviation σ = SQRT(Σ(X - µ)2/N) (Does that give you a headache?)
USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. • David Letterman
Statistics: The only science that enables different experts using the same figures to draw different conclusions. • Evan Esar (1899 - 1995), US humorist
Critical Skepticism • Remember the Rabbit Pie example from last week? • The “critical consumer” of statistics asked “what do you mean by ’50/50’”?
Remember . . . • I do NOT want you to become cynical. • Not all “media bias” (nor bad research) is intentional. • Just be sensible, critical, skeptical. • As you “consume” statistics, ask some questions . . .
Ask yourself. . . • Who says so?(A Zest commercial is unlikely to tell you that Irish Spring is best.) • How does he/she know?(That Zest is “the best soap for you.”) • What’s missing?(One year, 33% of female grad students at Johns Hopkins married faculty.) • Did somebody change the subject?(“Camrys are bigger than Accords.” “Accords are bigger than Camrys.”) • Does it make sense?(“Study in NYC: Working woman with family needed $40.13/week for adequate support.”)
What were . . . • . . . some claims you all heard this week?
Last week . . . • We learned about frequency distributions. • I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. • And I asserted there’s another, even shorter-hand way.
Measures of Central Tendency • Mode • Most frequent score (or scores – a distribution can have multiple modes) • Median • “Middle score” • 50th percentile • Mean - µ (“mu”) • “Arithmetic average” • ΣX/N
OK, so which do we use? • Means allow further arithmetic/statistical manipulation. But . . . • It depends on: • The type of scale of your data • Can’t use means with nominal or ordinal scale data • With nominal data, must use mode • The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). (Look at our “Number of MLB games” distribution.) • The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).
Have sidled up to SHAPES of distributions • Symmetrical • Skewed – positive and negative • Flat
Why . . . • . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? • “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).
Note . . . • We started with a bunch of specific scores. • We put them in order. • We drew their distribution. • Now we can report their central tendency. • So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. • Note MANY distributions could have a particular central tendency! • If we went back to ALL the specifics, we’d be back at square one.
Measures of Dispersion • Range • Semi-interquartile range • Standard deviation • σ (sigma)
Range • Highest score minus the lowest score. • Like the mode . . . • Easy to calculate • Potentially misleading • Doesn’t take EVERY score into account. • What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. • ‘Cause MANY different distributions of scores can have the same central tendency! • “Standard Deviation”
Back to our data – MLB games • Let’s take just the men in this class • xls spreadsheet. • Measures of central tendency. • Go with mean. • So, how much do the actual scores deviate from the mean?
So . . . • Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is. • Let’s calculate the Standard Deviation. • As always, start inside the parentheses. • Σ(X - µ)
Damn! • OK, let’s try it on a smaller set of numbers.
Damn! (cont’d.) • OK, let’s try it on a smaller set of numbers.
OK . . . • . . . so mathematicians at this point do one of two things. • Take the absolute value or square ‘em. • We square ‘em. Σ(X - µ)2
Standard Deviation (cont’d.) • Then take the average of the squared deviations. Σ(X - µ)2/N • But this number is so BIG!
OK . . . • . . . take the square root (to make up for squaring the deviations earlier). • σ = SQRT(Σ(X - µ)2/N) • Now this doesn’t give you a headache, right? • I said “right”?
We need . . . • A measure of spread that is NOT sensitive to every little score, just as median is not. • SIQR: Semi-interquartile range. • (Q3 – Q1)/2
Who wants to guess . . . • . . . What I think is the most important sentence in S, Z, & Z (2003), Chapter 2?
p. 19 • Penultimate paragraph, first sentence: • “If differences in the dependent variable are to be interpreted unambiguously as a result of the different independent variable conditions, proper control techniques must be used.”
http://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.htmlhttp://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.html • Click on Statistics Primer.
Homework • LOTS of reading. See syllabus. • Send a table/graph/chart that you’ve read this past week. Send email by noon, Friday, 9/16/2005. See you next week.