1 / 56

Chapter 3 Reliability and Objectivity

Chapter 3 Reliability and Objectivity. Chapter 3 Outline. Selecting a Criterion Score Types of Reliability Reliability Theory Estimating Reliability – Intraclass R Spearman-Brown Prophecy Formula Standard Error of Measurement Objectivity Reliability of Criterion-referenced Tests

len-hill
Download Presentation

Chapter 3 Reliability and Objectivity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3Reliability and Objectivity

  2. Chapter 3 Outline • Selecting a Criterion Score • Types of Reliability • Reliability Theory • Estimating Reliability – Intraclass R • Spearman-Brown Prophecy Formula • Standard Error of Measurement • Objectivity • Reliability of Criterion-referenced Tests • Reliability of Difference Scores

  3. Objectivity • Interrater Reliability • Agreement of competent judges about the value of a measure.

  4. Reliability • Dependability of scores • Consistency • Degree to which a test is free from measurement error.

  5. Selecting a Criterion Score • Criterion score – the measure used to indicate a person’s ability. • Can be based on the mean score of the best score. • Mean Score – average of all trials. • Usually a more reliable estimate of a person’s true ability. • Best Score – optimal score a person achieves on any one trial. • May be used when criterion score is to be used as an indicator of maximum possible performance.

  6. Potential Methods to Select a Criterion Score • Mean of all trials. • Best score of all trials. • Mean of selected trials based on trials on which group scored best. • Mean of selected trials based on trials on which individual scored best (i.e., omit outliers). Appropriate method to use depends on the situation.

  7. Norm-referenced Test • Designed to reflect individual differences.

  8. In Norm-referenced Framework • Reliability - ability to detect reliable differences between subjects.

  9. Types of Reliability • Stability • Internal Consistency

  10. Stability (Test-retest) Reliability • Each subject is measured with same instrument on two or more different days. • Scores are then correlated. • An intraclass correlation should be used.

  11. Internal Consistency Reliability • Consistent rate of scoring throughout a test or from trial to trial. • All trials are administered in a single day. • Trial scores are then correlated. • An intraclass correlation should be used.

  12. Sources of Measurement Error • Lack of agreement among raters (i.e., objectivity). • Lack of consistent performance by person. • Failure of instrument to measure consistently. • Failure of tester to follow standardized procedures.

  13. Reliability Theory X = T + E Observed score = True score + Error 2X = 2t + 2e Observed score variance = True score variance + Error variance Reliability = 2t ÷ 2X Reliability = (2X - 2e) ÷ 2X

  14. Reliability depends on: • Decreasing measurement error • Detecting individual differences among people • ability to discriminate among different ability levels

  15. Reliability • Ranges from 0 to 1.00 • When R = 0, there is no reliability. • When R = 2, there is maximum reliability.

  16. Reliability from Intraclass R • ANOVA is used to partition the variance of a set of scores. • Parts of the variance are used to calculate the intraclass R.

  17. Estimating Reliability • Intraclass correlation from one-way ANOVA: • R = (MSA – MSW)  MSA • MSA = Mean square among subjects (also called between subjects) • MSw = Mean square within subjects • Mean square = variance estimate • This represents reliability of the mean test score for each person.

  18. Sample SPSS One-way Reliability Analysis

  19. Estimating Reliability • Intraclass correlation from two-way ANOVA: • R = (MSA – MSR)  MSA • MSA = Mean square among subjects (also called between subjects) • MSR = Mean square residual • Mean square = variance estimate • Used when trial to trial variance is not considered measurement error (e.g., Likert type scale).

  20. Sample SPSS Two-way Reliability Analysis

  21. What is acceptable reliability? • Depends on: • age • gender • experience of people tested • size of reliability coefficients others have obtained • number of days or trials • stability vs. internal consistency coefficient

  22. What is acceptable reliability? • Most physical measures are stable from day- to-day. • Expect test-retest Rxx between .80 and .95. • Expect lower Rxx for tests with an accuracy component (e.g., .70). • For written test, want RXX > .70. • For psychological instruments, want RXX > .70. • Critical issue: time interval between 2 test sessions for stability reliability estimates. 1 to 3 days apart for physical measures is usually appropriate.

  23. Factors Affecting Reliability • Type of test. • Maximum effort test expect Rxx .80 • Accuracy type test expect Rxx .70 • Psychological inventories expect Rxx .70 • Range of ability. • Rxx higher for heterogeneous groups than for homogeneous groups. • Test length. • Longer test, higher Rxx

  24. Factors Affecting Reliability • Scoring accuracy. • Person administering test must be competent. • Test difficulty. • Test must discriminate among ability levels. • Test environment, organization, and instructions. • favorable to good performance, motivated to do well, ready to be tested, know what to expect.

  25. Factors Affecting Reliability • Fatigue • decreases Rxx • Practice trials • increase Rxx

  26. Coefficient Alpha • AKA Cronbach’s alpha • Most widely used with attitude instruments • Same as two-way intraclass R through ANOVA • An estimate of Rxx of a criterion score that is the sum of trial scores in one day

  27. Coefficient Alpha Ralpha = [K / (K-1)] x [(S2x - S2trials) / S2x] • K = # of trials or items • S2x = variance for criterion score (sum of all trials) • S2trials = sum of variances for all trials

  28. Kuder-Richardson (KR) • Estimate of internal consistency reliability by determining how all items on a test relate to the total test. • KR formulas 20 and 21 are typically used to estimate Rxx of knowledge tests. • Used with dichotomous items (scored as right or wrong). • KR20 = coefficient alpha

  29. KR20 • KR20 = [K / (K-1)] x [(S2x - pq) / S2x] • K = # of trials or items • S2x = variance of scores • p = percentage answering item right • q = percentage answering item wrong • pq = sum of pq products for all k items

  30. KR20 Example Item p q 1 .50 .50 2 .25 .75 3 .80 .20 4 .90 .10 If Mean = 2.45 and SD = 1.2, what is KR20? pq .25 .1875 .16 .09 pq = 0.6875 KR20 = (4/3) x (1.44 – 0.6875)/1.44 KR20 = .70

  31. KR21 • If assume all test items are equally difficult, KR20 can be simplified to KR21 KR21 =[(K x S2)-(Mean x (K - Mean)] ÷ [(K-1) x S2] • K = # of trials or items • S2 = variance of test • Mean = mean of test

  32. Equivalence Reliability (Parallel Forms) • Two equivalent forms of a test are administered to same subjects. • Scores on the two forms are then correlated.

  33. Spearman-Brown Prophecy formula • Used to estimate rxx of a test that is changed in length. • rkk = (k x r11) ÷ [1 + (k - 1)(r11)] • k = number of times test is changed in length. • k = (# trials want) ÷ (# trials have) • r11 = reliability of test you’re starting with • Spearman-Brown formula will give an estimate of maximum reliability that can be expected (upper bound estimate).

  34. Standard Error of Measurement (SEM) • Degree you expect test score to vary due to measurement error. • Standard deviation of a test score. • SEM = Sx1 - Rxx • Sx = standard deviation of group • Rxx = reliability coefficient • Small SEM indicates high reliability

  35. SEM • example: written test: Sx = 5 Rxx = .88 • SEM = 5  1 - .88 = 1.73 • Confidence Interval: 68% X ± 1.00 (SEM) 95% X ± 1.96 (SEM) • If X =23 23 + 1.73 = 24.73 23 - 1.73 = 21.27 • 68% confident true score is between 21.27 and 24.73

  36. Objectivity (Rater Reliability) • Degree of agreement between raters. • Depends on: • clarity of scoring system. • degree to which judge can assign scores accurately. • If test is highly objective, objectivity is obvious and rarely calculated. • As subjectivity increases, test developer should report estimate of objectivity.

  37. Two Types of Objectivity: • Intrajudge objectivity • consistency in scoring when test user scores same test two or more times. • Interjudge objectivity • consistency between two or more independent judgments of same performance. • Calculate objectivity like reliability, but substitute judges scores for trials.

  38. Criterion-referenced Test • A test used to classify a person as proficient or nonproficient (pass or fail).

  39. In Criterion-referenced Framework: • Reliability - defined as consistency of classification.

  40. Reliability of Criterion-referenced Test Scores • To estimate reliability, a double-classification or contingency table is formed.

  41. Contingency Table(Double-classification Table) Day 2 Pass Fail Pass A B Day 1 Fail C D

  42. Proportion of Agreement (Pa) • Most popular way to estimate Rxx of CRT. • Pa = (A + D) ÷ (A + B + C + D) • Pa does not take into account that some consistent classifications could happen by chance.

  43. Example for calculating Pa Day 2 Pass Fail Pass 45 12 Day 1 Fail 8 35

  44. Day 2 Pass Fail Pass 45 12 Day 1 Fail 8 35 Pa = (A + D) ÷ (A + B + C + D) Pa = (45 + 35) ÷ (45 + 12 + 8 + 35) Pa = 80 ÷ 100 = .80

  45. Kappa Coefficient (K) • Estimate of CRT Rxx with correction for chance agreements. K = (Pa - Pc) ÷ (1 - Pc) • Pa = Proportion of Agreement • Pc = Proportion of Agreement expected by chance Pc = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D)2

  46. Example for calculating K Day 2 Pass Fail Pass 45 12 Day 1 Fail 8 35

  47. Day 2 Pass Fail Pass 45 12 Day 1 Fail 8 35 • K = (Pa - Pc) ÷ (1 - Pc) • Pa = .80

  48. Day 2 Pass Fail Pass 45 12 Day 1 Fail 8 35 Pc = [(A+B)(A+C)+(C+D)(B+D)]÷(A+B+C+D)2 Pc = [(45+12)(45+8)+(8+35)(12+35)]÷(100)2 Pc = [(57)(53)+(43)(47)]÷(10,000) = 5,042÷10,000 Pc = .5042

  49. Kappa (K) • K = (Pa - Pc) ÷ (1 - Pc) • K = (.80 - .5042) ÷ (1 - .5042) • K = .597

  50. Modified Kappa (Kq) • Kq may be more appropriate than K when proportion of people passing a criterion-referenced test is not predetermined. • Most situations in exercise science do not predetermine the number of people who will pass.

More Related