410 likes | 562 Views
Psy 425 Tests & Measurements. Furr & Bacharach Chapter 5 Conceptual Basis for Reliability. True Scores?. Do scores on a test accurately reflect real psychological differences? Assessment of reliability Detecting the ability of a test to accurately reflect real differences.
E N D
Psy 425 Tests & Measurements Furr & Bacharach Chapter 5 Conceptual Basis for Reliability
True Scores? • Do scores on a test accurately reflect real psychological differences? • Assessment of reliability • Detecting the ability of a test to accurately reflect real differences
Classical Test Theory (CTT) • Conceptual basis of reliability • Outlines procedures for estimating the reliability of psychological measures
CTT • True differences vs. measurement error • A test’s reliability reflects the extent to which the differences in respondents’ test scores are a function of their true psychological differences, as opposed to measurement error…
Reliability • Not all or none • Is on a continuum • A test may be more or less reliable
Theoretical • Reliability is a theoretical notion • Not directly observable • Can only estimate the reliability
Derivation of Reliability Estimate • Estimate is derived based on three factors: • Observed scores • True scores • Measurement error
Observed Scores • Values obtained from measurement of some characteristic of an individual
True Scores • Real, true amounts of that characteristic
Reliability • Extent to which observed scores are consistent with true scores as opposed to other often unknown test and test administration characteristics
Measurement Error • “Other” characteristics that contribute to differences in observed scores • These characteristics create inconsistencies between observed scores and true scores
Sources of Measurement Error? Can all sources be accounted for?
Accurate Measurement? • Factors can obscure observed scores… • Measurement of physical properties… • Measurement of psychological attributes… Height & Weight? Post-partum Depression?
What sources of error might contribute to scores on a test of depression (i.e., inflate or deflate true scores)? • Interpretation of written items • Incorrect recording of answers • Secondary gain? • Defensive or avoidant? • Psychological mindedness? • Cultural factors?
Test reliability depends on… • Extent to which differences in test scores can be attributed to real inter- or intra- individual differences • AND • Extent to which such differences are a function of measurement error
CTT • Person’s observed score on a test is a function of that person’s true score, plus error:
Fundamental Theoretical Assumption of CTT • Observed scores on a psychological measure are determined by respondents’ true scores and by measurement error
Random Error • Inflation and deflation caused by error is independent of the individuals’ true levels of the psychological attribute being measured… • Interpretation of written items • Incorrect recording of answers • Secondary gain? • Defensive or avoidant? • Psychological mindedness? • Cultural factors?
Important consequences of assumption of random error: • Error cancels itself out across respondents • Error scores are uncorrelated with true scores
Size of reliability coefficient • Test’s reliability • Varies between 0 and 1 • Larger values = greater psychometric quality • As value increases, a greater proportion of the differences among observed scores can be attributed to differences among true scores
Good vs. poor test reliability • No clear cutoff • In social science research, .70 to .80 is satisfactory • Less than that, marginal to poor • What about test reliability = 0; is the test at all useful? What about .43?
Improving reliability… Improved Test Rxx = .48 Rxx = .74
Error variance • Small degree = respondents’ scores are only being slightly affected by measurement error
Index of reliability • “index of reliability” = unsquared correlation between observed and true scores • USUALLY – referring to coefficient of reliability or R2
Reliability and Standard Error of Measurement • Standard deviation of error scores • Represents average size of error scores • The greater average difference between observed scores and true scores, the less reliable the test • Closely link to reliability - large sempoor Rxx • If Rxx = 1, then sem = 0