1 / 66

Introduction to Measurement Theory Liu Xiaoling The Department of Psychology ECNU

Introduction to Measurement Theory Liu Xiaoling The Department of Psychology ECNU. Email: xlliu@psy.ecnu.edu.cn. Chapter 5 Reliability §1 Theory of Reliability. Interpretation of Reliability

Download Presentation

Introduction to Measurement Theory Liu Xiaoling The Department of Psychology ECNU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Measurement Theory Liu Xiaoling The Department of Psychology ECNU Email: xlliu@psy.ecnu.edu.cn

  2. Chapter 5 Reliability§1 Theory of Reliability • Interpretation of Reliability Reliability refers to the degree of consistency or reproducibility of measurements (or test scores).

  3. Qualified Reliability Coefficient for Types of Tests Ability or aptitude test, achievement test .90 and above Personality, interests, value, attitude test .80 and above

  4. EXAMPLES • Stanford-Binet Fifth Edition: full-scale IQ (23 age ranges), .97-.98; test-retest reliability coefficients for verbal and nonverbal subtests, from the high .7’s to the low .9’s. • WISC-IV: split-half reliabilities for full scale IQ, .97 • WAIS-III: average split-half reliability for full scale IQ is .98; .97 for verbal IQ; .94 for performance IQ • Thurstone’s Attitude Scale: .80 - .90 • Rosenberg’s Self-Esteem Scale(1965): α(.77 - .88); test –retest, .85

  5. Errors— Inconsistent and inaccurate Effect Error Refers to the inconsistent and in accurate effects caused by the variable factors which are unrelated to the objective. Three Types: Random Systematic Sampling

  6. Random Error An error due to chance alone, randomly distributed around the objective . Random errors reduce both the consistencyand the accuracy of the test scores.

  7. Systematic Error An error in data that is regular and repeatable due to improper collection or statistical treatment of the data. Systematic errors do not result in inconsistent measurement, but cause inaccuracy.

  8. Sampling Error Deviations of the summary values yielded by samples, form the values yielded by entire population.

  9. Classical True Score Theory Founders: Charles Spearman (1904,1907,1913) J. P. Guilford (1936) Assumptions: • One Formula X=T+E X, an individual’s observed score E, random error score (error of measurement) T, the individual’s true score

  10. CONCEPTION True score CTT assumes that each person has a true score that would be obtained if there were no errors in measurement. INTERPRETATION: The average of all the observed scores obtained over an infinite number of repeated things with the same test.

  11. TABLE 5.1 One Measure Data

  12. Three Principles 1. The mean of the error scores for a population of examinees is zero. 2. The correlation between true and error scores for a population of examinees is zero. 3. The correlation between error scores from two independent testing is zero.

  13. Reliability Coefficient Reliability coefficient can be defined as the correlation between scores on parallel test forms. (5.1) Mathematical Definition: Reliability coefficient is the ratio of true score variance to observed score variance.

  14. and (5.2) As se increases, rtt decrease if St won’t vary .

  15. §2 Sources of Random Errors • Sources from Tests • Sources from Tests Administration and Scoring • Sources from Examinees

  16. Sources from Tests Item sampling is lack of representativeness. Item format is improper. Item difficulty is too high or too low. Meaning of sentence is not clear. Limit of test time is too short.

  17. Sources from Tests Administration and Scoring Test Conditions is negtive. Examiner affects examinees’ performance. Unexpected disturbances occur. Scoring isn’t objective; counting is inaccurate.

  18. Sources from examinees Motive for Test Negative Emotions ( e.g., anxiety) Health Learning, Development and Education Experience in Test

  19. §3 Estimates Reliability Coefficient Test-retest Reliability (Coefficient of Stability) Alternate-Forms Reliability (Coefficient of Equivalence) Coefficients of Internal Consistency Scorer reliability (Inter-rater Consistency)

  20. Test-retest Reliability Also called Coefficient of Stability, which refers tothe correlation between test scores obtained by administering the same form of a test on two separate occasions to the same examinee group. TESTRETEST INTERVAL THE SAME examinee

  21. REVIEW CORRELATION Figure 5.1 Scatter Plots for Two Variates

  22. Formula for estimating reliability (5.3) , test score Pearson product moment correlation coefficient , retest score , sample size

  23. Application Example An subjective wellbeing scale administered to 10 high school students and half a year later they were tested the same scale again. Estimate the reliability of the scale. Table 5.2

  24. Computing statistics Answer:

  25. Transform of formula 5.3 (5.4) , mean of first test scores , mean of retest scores , standard deviation of first test scores , SD of retest scores

  26. Quality of test-retest reliability: • Estimates the consistence of tests across time interval. • Sources of errors: Stability of the trait measured Individual differences on development, education, learning, training, memory, etc.. Unexpected disturbances during test administration.

  27. Alternate-Forms Reliability also called equivalent/ parallel forms reliability, which refers to the correlation between the test scores obtained by separately administering the alternate or equivalent forms of the test to the same examinees on one occasions. IMMEDIATE FORMⅡ FORMⅠ THE SAME examinee

  28. Application Example Two alternate forms of a creative ability test administered ten students in seventh grade one morning. Table 5.3 shows the test result. Estimate the reliability of this test. Table 5.3

  29. ANSWER: If use formula 5.4, then

  30. Exercise1 Use formula 5.3 and 5.4 independently to estimate the reliability coefficient for the data in the following table.

  31. How to eliminate the effect of order of forms administered? Method: First, divide one group of examinees into two parallel groups; Second, group one receives formⅠ of the test, and group two receives formⅡ; Third, after a short interval, group one receives formⅠ, and group two receives formⅡ; Compute the correlation between all the examinees’ scores on two forms of the test.

  32. Sources of Error • Whether the two forms of test are parallel or equivalent, such as the consistence on content sampling, item format, item quantity, item difficulty, SD and means of the two forms. • Fluctuations in the individual examinee’s mind, in-cluding emotions, test motivation, health, etc.. • Other unexpected disturbance.

  33. Coefficient of Stability and Equivalence the correlation between the two group observed scores, when the two alternate test forms are administered on two separate occasions to the same examinees. FORM ⅠFORMⅡ INTERVAL SAME EXAMINEES

  34. Coefficients of Internal Consistency When examinees perform consistently across items within a test, the test is said to have item homogeneity. Internal consistency coefficient is an index of both item content homogeneity and item quality.

  35. Quality: One administration of a single form of test Error sources: Content sampling Fluctuations in the individual examinee’s state, including emotions, test motivation, health, etc..

  36. Split-Half Reliability To get the split –half reliability, the test developer administers the test to a group of examinees; then divides the items in to two subtest, each half the length of the original test; computes the correlation of the two halves of the test. Procedures

  37. Methods to divide the test into two parallel halves: 1. Assign all odd-number items to half 1 and all even-number items to half 2. 2. Rank order the items in terms of their difficulty levels based on the responses of the examinees; then apply the method 1. 3. Randomly assign items to the two halves. 4. Assign items to half-test forms as that the forms are “matched ” in content.

  38. Table 5.4 Illustrative Data for Split-half Reliability Estimation

  39. Employing formula 5.3 to compute rhh Attention:This rhhactually give the reliability of only a half-test. That is, it underestimates the reliability coefficient for the full-length test.

  40. Employ the Spearman-Brown formula to correct rhh (5.5)

  41. Spearman-Brown general formula (5.6) is the estimated coefficient is the obtained coefficient is the number of times the test is lengthened or shortened

  42. Kuder-Richardson Reliability (Kuder & Richardson,1 937) Kuder-Richardson formula 20 (KR20) ( 5.7) , the number of items, the total test variance, , the proportion of the examinees who pass each item , the proportion of the examinees who do not pass each item Dichotomously Scored Items

  43. Employ the data in table 5.3,

  44. Coefficient-Alpha ( )(Cronbach,1951) (5.8 ) , the total test variance , the variance of item i

  45. Exercise 2 Suppose that examinees have been tested on four essay items in which possible scores range form 0 to 10 points, and , , , . If total score variance is 100, then estimate the reliability of the test.

  46. Scorer reliability (Inter-rater Consistency) When a sample of test items are independently scored by two or more scorers or raters, each examinee should have several test scores. So there is a need to measure consistency of the scores over different scorers.

  47. Methods 1 The correlation between the two set of scores over two scorers (Pearson correlation; Spearman rank correlation ) 2 Kendall coefficient of concordance

  48. Kendall coefficient of concordance (5.9) K, the number of scorers, N,the number of examinees Ri, the sum of ranks for each examinee over all scorers

  49. Table 5.5 Scores of 6 Essays for 6 Examinees Employ formula 5.9, compute the scorer correlation. Key: .95

More Related