1 / 81

Reliability

Reliability. Psych 395 - DeShon. How Do We Judge Psychological Measures?. Two concepts: Reliability and Validity Reliability: How consistent is the assessment over time, items or raters? How reproducible are the measurements? How much measurement error is involved?

yagil
Download Presentation

Reliability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reliability Psych 395 - DeShon

  2. How Do We Judge Psychological Measures? • Two concepts: Reliability and Validity • Reliability: How consistent is the assessment over time, items or raters? How reproducible are the measurements? How much measurement error is involved? • Validity: How well does an assessment measure what it is supposed to measure? How accurate is our assessment? • An assessment is valid if it measures what it purports to measure.

  3. Correlation Review

  4. Some Data from the 2005 Baseball Season

  5. Questions We Might Ask … • How strongly is payroll associated with winning percentage? • How strongly is payroll associated with making the playoffs? • How can we answer these?

  6. Option 1: Plot the Data

  7. Option 2: Quantify the Association with the Correlation Coefficient

  8. The Correlation Coefficient • Credited to Karl Pearson (1896) • Measures the degree of linear association between two variables. • Ranges from -1.0 to 1.0 • Sign refers to direction • Negative: As X increases Y decreases • Positive: As X increases Y increases

  9. One Formula • Symbolized by r • Covariance of X and Y Divided by the Product of the SDs of X and Y.

  10. Calculation of r for Payroll (X) and Winning Percentage (Y) • covXY = 1.13 • sX = 34.23 • sY = .07

  11. Calculation of r for Payroll (X) and Making Post-Season (Y) • Y coded so that 1=Playoffs 0=No • covXY = 8.24 • sX = 34.23 • sY = .45

  12. Examples of CorrelationsSource: Meyer et al. (2001)

  13. Commonly Used Rule of Thumb • +/- .10 is Small • +/- .30 is Medium • +/- .50 is Large • Use these with care. This guidelines only provide a loose framework for thinking about the size of correlations • Sources: Cohen (1988) and Kline (2004)

  14. Now Back to Reliability

  15. Classical Test Theory X = T + E where X = Observed Score T = True Score E = Error score

  16. Consider the Construct of Self-Esteem • Global self-esteem reflects a person’s overall evaluation of value and worth. • William James (1890) argued that self-esteem was the result of an individual’s perceived successes divided by their pretensions • Rosenberg (1965) defined global self-esteem as an individual’s overall judgment of adequacy • We can’t directly observe self-esteem

  17. Measuring Self-Esteem • We can ask people questions that reflect individual differences in self-esteem. • “I feel that I have a number of good qualities” • “I see myself as a person with high self-esteem” • We assume that a “hidden” self-esteem variable causes people to respond to these questions. • We do not want to assume that these items are perfect indicators of an individual’s level of self-esteem.

  18. Classical Test Theory X = T + E where X = Observed Score T = True Score E = Error score

  19. Classical Test Theory Assumptions • True scores and errors are uncorrelated (independent) • Errors across people average to zero • Across repeated measurements, a person’s average score is ≈ equal to his/her true score.

  20. Thinking about Total Variability If X = T + E, then: var (X) = var (T) + var (E)

  21. Reliability Coefficients Reliability coefficients reflect the proportion of true score variance to observed score variance Therefore reliabilities range from 0.0 (no true score variance) to 1.0 (all true-score variance)

  22. Classic Definition of Reliability • The ratio of true score variance to total score variance. • Test 1: Total Variance = 10; True Score Variance = 9. • Test 2: Total Variance = 20; True Score Variance = 15. • Which Test is More Reliable?

  23. Reliability • More technical: To what extent do observed scores reflect true scores? • How consistentis the assessment?

  24. Three Kinds of Reliability • Internal Consistency (Content) • Random error affects responses to items on an assessment • Test-Retest (Time) • The construct stays the same. However, random errors vary from one occasion to the next. • Inter-Rater (Observer Biases)

  25. Internal Consistency • Use a 5-item measure of Self-Esteem. • 1. I feel that I am a person of worth, at least on an equal basis with others. • 2. I feel that I have a number of good qualities. • 3. All in all, I am inclined to feel that I am a failure. • 4. I am able to do things as well as most other people. • 5. I feel I do not have much to be proud of. • Response Options (1 = Strongly Disagree to 5 = Strongly Agree)

  26. Internal Consistency

  27. Correlate All Items (N = 450)

  28. Summary Statistics of those Correlations • Average: .42 • Standard Deviation: .13 • Minimum: .25 (Items 4 & 5) • Maximum: .70 (Items 1 & 2) • Standardized Alpha = .78 • Alpha is an index of how strongly the items on a measure are associated with each other.

  29. Coefficient Alpha

  30. Coefficient Alpha () Where you need (1) the # of items (called k) and (2) the average inter-item correlation. This formula yields the standardized alpha.

  31. Coefficient Alpha versus Split-half reliability Estimates… • Split-Half Reliability – Divide the items on the assessment into 2 halves and then correlate the two halves. • Problem: Estimates fluctuate depending on what items get split into which halves. • Alpha is the average of all possible split-half reliabilities.

  32. Sample Matrix

  33. Real Results • 10 Item Measure of Self-Esteem for 451 women. • Correlate the average of the odd number items with the average of the even number items: r = .79 • Correlate average first five items with the average of the last five items: r = .67 • Average Inter-Item r = .46 • Standardized Alpha = .89

  34. Caveats about Coefficient Alpha …. • Recall – what goes into the Alpha calculation: • Number of items • Average inter-item correlation • There are at least two things to think about when considering Coefficient Alpha… • Length of the Assessment • Dimensionality

  35. Pay Attention to the Length of the Assessment.

  36. Constant average inter-item correlation (e.g. .420) but increase the number of items….

More Related