230 likes | 330 Views
Measurement in Psychology I: RELIABILITY. Lawrence R. Gordon. Do you support the civil union legislation?. What are some of the ways in which you can ask this question? How do you measure the response (operational definitions)?. Levels of Measurement. Nominal scales
E N D
Measurement in Psychology I:RELIABILITY Lawrence R. Gordon
Do you support the civil union legislation? • What are some of the ways in which you can ask this question? • How do you measure the response (operational definitions)?
Levels of Measurement • Nominal scales • giving names to data, putting into categories • Examples: sex, race labels; baseball uniform numbers • Ordinal scales • numbers give order but not distance • Examples: mailbox numbers; class rankings
Levels of Measurement (cont.) • Interval scales • numbers indicate order and distance (they are separated by equal distances or intervals) • Example: Fahrenheit temperature • Ratio scales • numbers indicate order, distance, AND have a true zero point (zero = there isn’t any) • Examples: height; weight; miles per hour; time
Levels of Measurement ExampleAuto race which started at 2 pm
Closed vs. Open Responses • Closed responses (a.k.a. forced choice) • Examples (rate civil union support on a scale 1 to 9) • Advantages • you know what the responses will be (or what they should be!) because of restrictions on choice • easy to empirically evaluate (relatively) • gives data that gives a straightforward answer to how you ask your question • coding not necessary, usually
Closed vs. Open Responses • Closed responses (a.k.a. forced choice) • Disadvantages • may not be sensitive enough to get some interesting information • will not give you as clear an indication of what participants think/feel/report • “Do you agree that same-sex couples should have the right to marry/civil union?” 1 2 3 4 5 6 7 8 9 Disagree Agree Completely Completely
Closed vs. Open Responses • Open responses (a.k.a. free response) • Examples (Do you support the civil union legislation? Why?) • Example from the survey used the first day? • “Please describe yourself in 12 words or less” • more on this in a bit... • Advantages • gives any answer participant wants • not restricted by choices
Closed vs. Open Responses • Open responses (cont.) • Disadvantages • have to code to empirically evaluate (time intensive, need to find people who will do it) • reliability issues!
Reliability • Consistency (stays the same) • Repeatable (get the same results again and again) • Measures need to be reliable to be good measures • Now, some nitty-gritty...
Reliability (cont.) • Measuring closed responses • you don’t need to put things into categories • reliable over time (do you get the same answers again and again?) • if the answers vary greatly from one time of measurement to the next, the measurement is not reliable
Reliability (cont.) • Measuring closed responses (cont.) • scales (sets of questions designed to measure something) need to be given multiple times, or in multiple forms, and the answers must remain similar for the scale to be reliable • Example (personality scale?) • Types of reliability • Stability (“test-retest reliability”) • Equivalence (“parallel forms reliability”) • Consistency (“split-half reliability”) • Homogeneity (“internal consistency reliability”)
Reliability Quick Example • Any test, scale, inventory with items: E.g., a 50-item test, scored 0-50: • Form A 9/4 9/4, Form A • Examinee9/4 9/25Form A Form BOdd Even • 1 George 27 35 27 33 15 12 • 2 Alice 49 46 49 40 30 19 • 3 Mary 30 35 30 27 13 17 • 4 Larry 16 10 16 19 7 9 • 5 Linda 27 24 27 20 10 17 • 6 Doug 40 42 40 48 22 18 • 7 Chuck 21 18 21 35 10 11 • 8 Judy 42 39 42 35 19 23 • Test-retest: Form A, 9/4 vs 9/25 (“r=.92") Stability • Parallel forms: Form A vs Form B, 9/4 (“r=.69") Equivalence • Cross form: Form A 9/25 vs Form B 3/19 (“r=.72") Stab & Equiv • Split-half: Odd vs Even, Form A 9/4 (“r=.79") Consistency • Alpha reliability No example – data from all 50 items Internal consistency
Reliability (cont.) • Measuring open responses • Will often code into categories (Examples) • How do you assess reliability?
Reliability (cont.) • Measuring open responses (cont.) • Does everyone put the response into the same category? If yes, you have good inter-coder reliability • more specific operational definitions will increase this reliability • Coding personality responses into categories • Using positive, negative, and neutral descriptors
Reliability (cont.) • Measuring behavioral responses through observation • special cases of open response, can’t really control what participants do • coding and/or rating what you observe • reliability of ratings (interrater reliability? If all raters agree on the rating, then yes.) • need to be very clear on operational definitions • Baggage claim study (Scherer & Ceschi, 2000)
Assessing Reliability • Steps • decide on operational definitions of your variables and scale(s) of measurement • train your coders/raters, answer questions, and alleviate confusion • do the coding and rating • compare responses • were the measurements reliable?
Reliability Exercise • Measuring your personality • Looking for “big” traits • defining big traits and training coders • The Big Five Personality Factors 1. Open to Experience (O) vs. Closed to Experience (NO) 2. Conscientious (C) vs. Nonconscientious (NC) 3. Extraverted (E) vs. Introverted (NE) 4. Agreeable (A) vs. Unagreeable (NA) 5. Neurotic (N) vs. Nonneurotic (NN) • Which one best fits the description? • Do the coding!
Reliability Exercise • Measuring your personality • Looking for “big” traits • compare responses to other coders • intercoder reliability • List number on which you agreed • List number on which you disagreed • Calculate the percentages • were the measurements reliable?
And for next time…is reliability enough? • If your measurement is reliable, does that mean that it is good? • Does being reliable make your measurement valid?