880 likes | 1.65k Views
Reliability. Psych 395 - DeShon. How Do We Judge Psychological Measures?. Two concepts: Reliability and Validity Reliability: How consistent is the assessment over time, items or raters? How reproducible are the measurements? How much measurement error is involved?
E N D
Reliability Psych 395 - DeShon
How Do We Judge Psychological Measures? • Two concepts: Reliability and Validity • Reliability: How consistent is the assessment over time, items or raters? How reproducible are the measurements? How much measurement error is involved? • Validity: How well does an assessment measure what it is supposed to measure? How accurate is our assessment? • An assessment is valid if it measures what it purports to measure.
Questions We Might Ask … • How strongly is payroll associated with winning percentage? • How strongly is payroll associated with making the playoffs? • How can we answer these?
Option 2: Quantify the Association with the Correlation Coefficient
The Correlation Coefficient • Credited to Karl Pearson (1896) • Measures the degree of linear association between two variables. • Ranges from -1.0 to 1.0 • Sign refers to direction • Negative: As X increases Y decreases • Positive: As X increases Y increases
One Formula • Symbolized by r • Covariance of X and Y Divided by the Product of the SDs of X and Y.
Calculation of r for Payroll (X) and Winning Percentage (Y) • covXY = 1.13 • sX = 34.23 • sY = .07
Calculation of r for Payroll (X) and Making Post-Season (Y) • Y coded so that 1=Playoffs 0=No • covXY = 8.24 • sX = 34.23 • sY = .45
Commonly Used Rule of Thumb • +/- .10 is Small • +/- .30 is Medium • +/- .50 is Large • Use these with care. This guidelines only provide a loose framework for thinking about the size of correlations • Sources: Cohen (1988) and Kline (2004)
Classical Test Theory X = T + E where X = Observed Score T = True Score E = Error score
Consider the Construct of Self-Esteem • Global self-esteem reflects a person’s overall evaluation of value and worth. • William James (1890) argued that self-esteem was the result of an individual’s perceived successes divided by their pretensions • Rosenberg (1965) defined global self-esteem as an individual’s overall judgment of adequacy • We can’t directly observe self-esteem
Measuring Self-Esteem • We can ask people questions that reflect individual differences in self-esteem. • “I feel that I have a number of good qualities” • “I see myself as a person with high self-esteem” • We assume that a “hidden” self-esteem variable causes people to respond to these questions. • We do not want to assume that these items are perfect indicators of an individual’s level of self-esteem.
Classical Test Theory X = T + E where X = Observed Score T = True Score E = Error score
Classical Test Theory Assumptions • True scores and errors are uncorrelated (independent) • Errors across people average to zero • Across repeated measurements, a person’s average score is ≈ equal to his/her true score.
Thinking about Total Variability If X = T + E, then: var (X) = var (T) + var (E)
Reliability Coefficients Reliability coefficients reflect the proportion of true score variance to observed score variance Therefore reliabilities range from 0.0 (no true score variance) to 1.0 (all true-score variance)
Classic Definition of Reliability • The ratio of true score variance to total score variance. • Test 1: Total Variance = 10; True Score Variance = 9. • Test 2: Total Variance = 20; True Score Variance = 15. • Which Test is More Reliable?
Reliability • More technical: To what extent do observed scores reflect true scores? • How consistentis the assessment?
Three Kinds of Reliability • Internal Consistency (Content) • Random error affects responses to items on an assessment • Test-Retest (Time) • The construct stays the same. However, random errors vary from one occasion to the next. • Inter-Rater (Observer Biases)
Internal Consistency • Use a 5-item measure of Self-Esteem. • 1. I feel that I am a person of worth, at least on an equal basis with others. • 2. I feel that I have a number of good qualities. • 3. All in all, I am inclined to feel that I am a failure. • 4. I am able to do things as well as most other people. • 5. I feel I do not have much to be proud of. • Response Options (1 = Strongly Disagree to 5 = Strongly Agree)
Summary Statistics of those Correlations • Average: .42 • Standard Deviation: .13 • Minimum: .25 (Items 4 & 5) • Maximum: .70 (Items 1 & 2) • Standardized Alpha = .78 • Alpha is an index of how strongly the items on a measure are associated with each other.
Coefficient Alpha () Where you need (1) the # of items (called k) and (2) the average inter-item correlation. This formula yields the standardized alpha.
Coefficient Alpha versus Split-half reliability Estimates… • Split-Half Reliability – Divide the items on the assessment into 2 halves and then correlate the two halves. • Problem: Estimates fluctuate depending on what items get split into which halves. • Alpha is the average of all possible split-half reliabilities.
Real Results • 10 Item Measure of Self-Esteem for 451 women. • Correlate the average of the odd number items with the average of the even number items: r = .79 • Correlate average first five items with the average of the last five items: r = .67 • Average Inter-Item r = .46 • Standardized Alpha = .89
Caveats about Coefficient Alpha …. • Recall – what goes into the Alpha calculation: • Number of items • Average inter-item correlation • There are at least two things to think about when considering Coefficient Alpha… • Length of the Assessment • Dimensionality
Constant average inter-item correlation (e.g. .420) but increase the number of items….