INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

INF 397CIntroduction to Research in Information StudiesFall, 2005Day 2

Standard Deviation σ = SQRT(Σ(X - µ)2/N) (Does that give you a headache?)

USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. • David Letterman

Statistics: The only science that enables different experts using the same figures to draw different conclusions. • Evan Esar (1899 - 1995), US humorist

Scales (last week)

Critical Skepticism • Remember the Rabbit Pie example from last week? • The “critical consumer” of statistics asked “what do you mean by ’50/50’”?

Remember . . . • I do NOT want you to become cynical. • Not all “media bias” (nor bad research) is intentional. • Just be sensible, critical, skeptical. • As you “consume” statistics, ask some questions . . .

Ask yourself. . . • Who says so?(A Zest commercial is unlikely to tell you that Irish Spring is best.) • How does he/she know?(That Zest is “the best soap for you.”) • What’s missing?(One year, 33% of female grad students at Johns Hopkins married faculty.) • Did somebody change the subject?(“Camrys are bigger than Accords.” “Accords are bigger than Camrys.”) • Does it make sense?(“Study in NYC: Working woman with family needed $40.13/week for adequate support.”)

What were . . . • . . . some claims you all heard this week?

Last week . . . • We learned about frequency distributions. • I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. • And I asserted there’s another, even shorter-hand way.

Measures of Central Tendency • Mode • Most frequent score (or scores – a distribution can have multiple modes) • Median • “Middle score” • 50th percentile • Mean - µ (“mu”) • “Arithmetic average” • ΣX/N

OK, so which do we use? • Means allow further arithmetic/statistical manipulation. But . . . • It depends on: • The type of scale of your data • Can’t use means with nominal or ordinal scale data • With nominal data, must use mode • The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). (Look at our “Number of MLB games” distribution.) • The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).

Scales (which measure of CT?)

Mean – “see saw” (from Tal, 2001)

Have sidled up to SHAPES of distributions • Symmetrical • Skewed – positive and negative • Flat

“Pulling up the mean”

Why . . . • . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? • “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).

Note . . . • We started with a bunch of specific scores. • We put them in order. • We drew their distribution. • Now we can report their central tendency. • So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. • Note MANY distributions could have a particular central tendency! • If we went back to ALL the specifics, we’d be back at square one.

Measures of Dispersion • Range • Semi-interquartile range • Standard deviation • σ (sigma)

Range • Highest score minus the lowest score. • Like the mode . . . • Easy to calculate • Potentially misleading • Doesn’t take EVERY score into account. • What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. • ‘Cause MANY different distributions of scores can have the same central tendency! • “Standard Deviation”

Back to our data – MLB games • Let’s take just the men in this class • xls spreadsheet. • Measures of central tendency. • Go with mean. • So, how much do the actual scores deviate from the mean?

So . . . • Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is. • Let’s calculate the Standard Deviation. • As always, start inside the parentheses. • Σ(X - µ)

Damn! • OK, let’s try it on a smaller set of numbers.

Damn! (cont’d.) • OK, let’s try it on a smaller set of numbers.

OK . . . • . . . so mathematicians at this point do one of two things. • Take the absolute value or square ‘em. • We square ‘em. Σ(X - µ)2

Standard Deviation (cont’d.) • Then take the average of the squared deviations. Σ(X - µ)2/N • But this number is so BIG!

OK . . . • . . . take the square root (to make up for squaring the deviations earlier). • σ = SQRT(Σ(X - µ)2/N) • Now this doesn’t give you a headache, right? • I said “right”?

Hmmm . . .

We need . . . • A measure of spread that is NOT sensitive to every little score, just as median is not. • SIQR: Semi-interquartile range. • (Q3 – Q1)/2

To summarize

Practice Problems

Who wants to guess . . . • . . . What I think is the most important sentence in S, Z, & Z (2003), Chapter 2?

p. 19 • Penultimate paragraph, first sentence: • “If differences in the dependent variable are to be interpreted unambiguously as a result of the different independent variable conditions, proper control techniques must be used.”

http://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.htmlhttp://highered.mcgraw-hill.com/sites/0072494468/student_view0/statistics_primer.html • Click on Statistics Primer.

Homework • LOTS of reading. See syllabus. • Send a table/graph/chart that you’ve read this past week. Send email by noon, Friday, 9/16/2005. See you next week.

INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

INF 397C Introduction to Research in Information Studies Fall, 2005 Day 2

Presentation Transcript

INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

INF 397C Introduction to Research in Library and Information Science Spring, 2005 Randolph G. Bias, Ph.D., CHFP rbias@i

INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 5

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620 ) Geodatabases Dr . David Arctur Research F

Introduction to Geographic Information Systems Fall 2013 (INF 385T- 28620) Dr. David Arctur Research Fellow, Adjunct F

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur

Introduction to Excel Day 2

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur

Grant Meacham Introduction to Information Studies LIBM 6320 Fall 2011

Introduction to Databases Day 2

INF 397C Fall, 2003 Days 13

Research Information Day

Introduction to Statistics − Day 2

INF 397C Introduction to Research in Library and Information Science Fall, 2003 Day 5

Introduction to Research Methods in Literary Studies

INF397C Introduction to Research in Information Studies Spring, 2009 Day 13

INF 397C Introduction to Research in Library and Information Science Fall, 2009 Day 3

SCC Day/Evening Enrollment Fall 2005 to Fall 2010

INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 4

Introduction to Statistics − Day 2

INF397C Introduction to Research in Information Studies Fall, 2009 Week 12