1 / 41

Understanding Reliability and Validity in Measurement Studies

Learn about reliability, consistency, repeatability, validity, objectivity, and key determination methods in norm-referenced measurements. Explore examples, techniques, and common terms.

richardwatt
Download Presentation

Understanding Reliability and Validity in Measurement Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6 Norm-Referenced Measurement

  2. Topics for Discussion • Reliability Consistency Repeatability • Validity Truthfulness • Objectivity Inter-rater reliability

  3. Observed, Error, and True Scores Observed Score = True Score + Error Score

  4. Reliability Reliability is that proportion of observed score variance that is true score variance

  5. Table 6-1Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP 1 103 105 -2 2 117 115 +2 3 116 120 -4 4 123 125 -2 5 127 125 +2 6 125 125 0 7 135 125 +10 8 126 130 -4 9 133 135 -2 10 145 145 0 Sum (S) 1250 1250 0 Mean (M) 125.0 125.0 0 Variance (S2) 133.6 116.7 16.9 S 11.6 10.8 4.1

  6. Interclass Reliability Pearson Product Moment • Test Retest • Equivalence Split Halves

  7. Table 6-2Sit-up Performance for 10 Subjects Subject Trial 1 Trial 2 1 45 49 2 38 36 3 54 50 4 38 38 5 47 49 6 39 38 7 39 43 8 42 43 9 29 30 10 42 42 Sum (S) 413 418 Mean 41.3 41.8 S 6.6 6.5 Variance (S2) 43.6 41.7 rxx’ = .927

  8. Spearman Brown Prophecy Formula k = the number of items I WANT to estimate the reliability for divided by the number of items I HAVE reliability for

  9. Table 6-3Odd and Even Scores for 10 Subjects Subject Odd Even 1 12 13 2 9 11 3 10 8 4 9 6 5 11 8 6 7 10 7 9 9 8 12 10 9 5 4 10 8 7 Sum (S) 92 86 Mean 9.2 8.6 S 2.2 2.6 Variance (S2) 4.8 6.7 rxx’ = .639

  10. K (change in test length) r11.25 .50 1.5 2.0 3.0 4.0 5.0 .10 .03 .05 .14 .18 .25 .31 .36 .22 .07 .12 .30 .36 .46 .53 .59 .40 .14 .25 .50 .57 .67 .73 .77 .50 .20 .33 .60 .67 .75 .80 .83 .60 .27 .43 .69 .75 .82 .86 .88 .68 .35 .52 .76 .81 .86 .89 .91 .80 .50 .67 .86 .89 .92 .94 .95 .92 .74 .85 .95 .96 .97 .98 .98 .96 .86 .92 .97 .98 .99 .99 .99 Table 6-4Values of rkk From Spearman-BrownProphecy Formula

  11. Table 6-5Effect of a Constant Change in Measures Subject Trial 1 Trial 2 1 15 25 2 17 27 3 10 20 4 20 30 5 23 33 6 26 36 7 27 37 8 30 40 9 32 42 10 33 43 Sum (S) 233 333 Mean 23.3 33.3 S 7.7 7.7 Variance (S2) 59.1 59.1 rxx’ = 1.00

  12. Intraclass Reliability ANOVA ModelCronbach's alpha coefficient Alpha Coefficient

  13. Intraclass (ANOVA) ReliabilitiesCommon terms you will encounter • Alpha Reliability • Kuder Richardson Formula 20 (KR20) • Kuder-Richardson Formula 21 (KR21) • ANOVA reliabilities

  14. Table 6-6Calculating the Alpha Coefficient Subject Trial 1 Trial 2 Trial 3 Total 1 3 5 3 11 2 2 2 2 6 3 6 5 3 14 4 5 3 5 13 5 3 4 4 11 SX 19 19 17 55 SX2 83 79 63 643 S22.70 1.70 1.30 9.50

  15. Calculating the Alpha Coefficient

  16. Index of Reliability The theoretical correlation between observed scores and true scores

  17. Table 6-7Student Scores on a 10-Item Multiple-Choice Quiz Items Subject 1 2 3 4 5 6 7 8 9 10 Total 1 1 1 1 1 1 1 0 1 0 1 8 2 0 1 0 1 1 0 1 0 1 1 6 3 0 1 1 0 1 1 0 1 0 0 5 4 1 0 0 0 1 0 1 1 0 0 4 5 0 0 0 1 0 1 0 1 0 0 3

  18. Standard Error of Measurement Reflects the degree to which a person's observed score fluctuates as a result of errors of measurement

  19. Factors Affecting Test Reliability • 1) Fatigue • 2) Practice • 3) Subject variability • 4) Time between testing • 5) Circumstances surrounding the testing periods • 6) Appropriate difficulty for testing subjects • 7) Precision of measurement • 8) Environmental conditions

  20. Decline in Reliability for the Harvard Alumni Activity Survey as the Time Between Testing Periods Increases Months Between Test-Retest

  21. Validity Types • Content-Related Validity • Criterion-Related Validity • Statistical or correlational • concurrent • predictive • Construct-Related Validity

  22. Standard Error of Estimate Standard Error Standard Error of Prediction

  23. SE of Measurement SE of Estimate Standard Errors

  24. Methods of Obtaining a Criterion Measure • Actual participation • e.g., golf, archery • Perform the criterion • known valid criterion (e.g., treadmill performance) • Expert judges • panel judges • Tournament participation • Round robin • Known valid test

  25. What are these? Concurrent Validity coefficients Table 6-8Correlation Matrix for Development of a Golf Skill Test (From Green et al., 1987)

  26. Table 6-9Concurrent Validity Coefficients for Golf Test • 2-item battery Middle distance shot Pitch shot .72 • 3-item battery Middle distance shot Pitch shot Long putt .76 • 4-item battery Middle distance shot Pitch shot Long putt Chip shot .77

  27. Correlations Between IQs of Related or Unrelated Children as a Function of Genetic Similarity and Similarity of Environment • Identical twins - reared together .88 • Identical twins - reared apart .75 • Fraternal twins - same sex .53 • Fraternal twins - opposite sex .53 • Siblings - reared together .49 • Siblings - reared apart .46 • Parent with child .52 • Foster parent with child .19 • Unrelated - reared together .16 From Glass & Stanley, 1970, p. 119

  28. Figure 6.1Diagram of Validity and Reliability Terms

  29. Interpreting the “r” you obtain

  30. Concurrent Validity This square represents variance in performance in a skill (e.g., golf)

  31. Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)

  32. Concurrent Validity Error The orange color represents ERROR or unexplained variance in the criterion (e.g., golf)

  33. A B C D Concurrent Validity Consider the Concurrent validity of the above 4 possible skills test batteries

  34. D – it has the MOST error and requires 4 tests to be administered A B C D Concurrent Validity Which test battery would you be LEAST likely to use? Why?

  35. C – it has the LEAST error but it requires 3 tests to be administered A B C D Concurrent Validity Which test battery would you be MOST likely to use? Why?

  36. A or B – requires 1 or 2 tests to be administered but you lose some validity A B C D Concurrent Validity Which test battery would you use if you are limited in time?

  37. What are these? Concurrent Validity coefficients Interpret these correlations Criterion

  38. Interpret these correlations What are these? Reliability coefficients

  39. What is this? Interpret these correlations Objectivity coefficient

  40. Grade K 1 2 3 4 Distance Gender 1/2 mile M .77 .74 .75 .74 .79 F .73 .77 .76 .67 .47 3/4 mile M .48 .54 .83 .89 .85 F .58 .64 .68 .83 .80 1 mile M .53 .56 .70 .84 .87 F .39 .54 .71 .90 .85 Example of Reliability Study(Rikli et al., RQES, 1992)

  41. SPSS Examples

More Related