1 / 75

Anita L. Stewart Institute for Health & Aging University of California, San Francisco

Class 5 Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to Change October 16, 2008. Anita L. Stewart Institute for Health & Aging University of California, San Francisco. Overview. Validity Including bias How bias affects validity

harlan
Download Presentation

Anita L. Stewart Institute for Health & Aging University of California, San Francisco

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Class 5Additional Psychometric Characteristics: Validity and Bias, Responsiveness, Sensitivity to Change October 16, 2008 Anita L. Stewart Institute for Health & Aging University of California, San Francisco

  2. Overview • Validity • Including bias • How bias affects validity • Responsiveness, sensitivity to change • Meaningfulness of change

  3. Validity • Does a measure (or instrument) measure what it is supposed to measure? • And…Does a measure NOT measure what it is NOT supposed to measure?

  4. Valid Scale? No! • There is no such thing as a “valid” scale • We accumulate “evidence” of validity in a variety of populations in which it has been tested • Similar to reliability

  5. Validation of Measures is an Iterative, Lengthy Process • Accumulation of evidence • Different samples • Longitudinal designs

  6. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  7. Content Validity: • Relevant when writing items • Extent to which a set of items represents the defined concept

  8. Relevance of Content Validity to Selecting Measures • “Conceptual adequacy” • Does “candidate” measure represent adequately the concept YOU are intending to measure

  9. Content Validity Appropriate at Two Levels • Battery or Are all relevant domainsinstrument represented in an instrument? • Measure Are all aspects of a defined concept represented in the items of a scale?

  10. Example of Content Validity of Instrument • You are studying health-related quality of life (HRQL) in clinical depression • Your HRQL concept includes sleep problems, ability to work, and social functioning • SF-36 - a candidate • Missing sleep problems

  11. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  12. Criterion Validity • How well a measure correlates with another measure considered to be an accepted standard (criterion) • Can be • Concurrent • Predictive

  13. Criterion Validity of Self-reported Health Care Utilization • Compare self-report with “objective” data (computer records of utilization) • # MD visits past 6 months (self-report) correlated .64 with computer records • # hospitalizations past 6 months (self-report) correlated .74 with computer records Ritter PL et al, J Clin Epid, 2001;54:136-141

  14. Criterion Validity of Screening Measure • Develop depression screening tool to identify persons likely to have disorder • Do clinical assessment only on those who screen “likely” • Criterion validity • Extent to which the screening tool detects (predicts) those with disorder • sensitivity and specificity, ROC curves

  15. Criterion Validity of Measure to Predict Outcome • If goal is to predict health or other outcome • Extent to which the measure predicts the outcome • Example: Develop self-reported war-related stress measure to identify vets at risk of PTSD • How well does it predict subsequent PTSD (Vogt et al., 2004, readings)

  16. Interpreting Validity Coefficients • Magnitude and conformity to hypothesis are important, not statistical significance • Nunnally: rarely exceed .30 to .40 which may be adequate (1994, p. 99) • McDowell and Newell: typically between 0.40 and 0.60 (1996, p. 36) • Max correlation between 2 measures = square root of product of reliabilities • 2 scales with .70 reliabilities, max correlation .70 • Correlation of .60 would be “high”

  17. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  18. Construct Validity Basics • Does measure relate to other measures in hypothesized ways? • Do measures “behave as expected”? • 3-step process • State hypothesis: direction and magnitude • Calculate correlations • Do results confirm hypothesis?

  19. Source of Hypotheses in Construct Validity • Prior literature in which associations between constructs have been observed • e.g., other samples, with other measures of constructs you are testing • Theory, that specifies how constructs should be related • Clinical experience

  20. Who Tests for Validity? • When measure is being developed, investigators should test construct validity • As measure is applied, results of other studies provide information that can be used as evidence of construct validity

  21. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  22. Convergent Validity • Hypotheses stated as expected direction and magnitude of correlations • “We expect X measure of depression to be positively and moderately correlated with two measures of psychosocial problems” • The higher the depression, the higher the level of problems on both measures

  23. Testing Validity of Expectations Regarding Aging Measure • Hypothesis 1: ERA-38 total score would correlate moderately with ADLS, PCS, MCS, depression, comorbidity, and age • Hypothesis 2: Functional independence scale would show strongest associations with ADLs, PCS, and comorbidity Sarkisian CA et al. Gerontologist. 2002;42:534

  24. Testing Validity of Expectations Regarding Aging Measure • Hypothesis 1: ERA-38 total score would correlate moderately with ADLS, PCS, MCS, depression, comorbidity, and age (convergent) • Hypothesis 2: Functional independence scale would show strongest associations with ADLs, PCS, and comorbidity Sarkisian CA et al. Gerontologist. 2002;42:534

  25. ERA-38 Convergent Validity Results: Hypothesis 1

  26. ERA-38: Non-Supporting Convergent Validity Results

  27. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  28. Discriminant Validity: Known Groups • Does the measure distinguish between groups known to differ in concept being measured? • Tests for mean differences between groups

  29. Example of a Known Groups Validity Hypothesis • Among three groups: • General population • Patients visiting providers • Patients in a public health clinic • Hypothesis: scores on functioning and well-being measures will be the best in a general population and the worst in patients in a public health clinic

  30. Mean Scores on MOS 20-item Short Form in Three Groups Public General MOS health populationpatientspatients Physical function 91 78 50 Role function 88 78 39 Mental health 78 73 59 Health perceptions 74 63 41 Bindman AB et al., Med Care 1990;28:1142

  31. PedsQL Known Groups Validity • Hypothesis: PedsQL scores would be lower in children with a chronic health condition than without JW Varni et al. PedsQL™ 4.0: Reliability and Validity of the Pediatric Quality of Life Inventory™ …, Med Care, 2001;39:800-812.

  32. Types of Measurement Validity • Content • Criterion • Construct • Convergent • Discriminant • Convergent/discriminant All can be: Concurrent Predictive

  33. Convergent/Discriminant Validity • Does measure correlate lower with measures it is not expected to be related to … than to measures it is expected to be related to? • The extent to which the pattern of correlations conforms to hypothesis is confirmation of construct validity

  34. Basis for Convergent/Discriminant Hypotheses • All measures of health will correlate to some extent • Hypothesis is of relative magnitude

  35. Example of Convergent/Discriminant Validity Hypothesis • Expected pattern of relationships: • A measure of physical functioning is “hypothesized” to be more highly related to a measure of mobility than to a measure of depression

  36. Example of Convergent/Discriminant Validity Evidence Pearson correlation: Mobility Depression Physical functioning .57 .25

  37. Testing Validity of Expectations Regarding Aging Measure • Hypothesis 1: ERA-38 total score would correlate moderately with ADLS, PCS, MCS, depression, comorbidity, and age (convergent) • Hypothesis 2: Functional independence scale would show strongest associations with ADLs, PCS, and comorbidity (convergent/discriminant) Sarkisian CA et al. Gerontologist. 2002;42:534

  38. ERA-38 Convergent/Discriminant Validity Results: Hypothesis 2

  39. ERA-38: Non-Supporting Validity Results

  40. Construct Validity Thoughts: Lee Sechrest • There is no point at which construct validity is established • It can only be established incrementally • Our attempts to measure constructs help us better understand and revise these constructs Sechrest L, Health Serv Res, 2005;40(5 part II), 1596

  41. Construct Validity Thoughts: Lee Sechrest (cont) • “An impression of construct validity emerges from examining a variety of empirical results that together make a compelling case for the assertion of construct validity”

  42. Construct Validity Thoughts: Lee Sechrest (cont) • Because of the wide range of constructs in the social sciences, many of which cannot be exactly defined.. • …once measures are developed and in use, we must continue efforts to understand them and their relationships to other measured variables.

  43. Overview • Validity • Including bias • Responsiveness, sensitivity to change • Meaningfulness of change

  44. Components of an Individual’s Observed Item Score (from Class 3) Observed true item score score random systematic error = +

  45. Random versus Systematic Error Observed true item score score Relevant to reliability random systematic error = + Relevant to validity

  46. Bias is Systematic Error • Affects validity of scores • If scores contain systematic error, cannot know the “true” mean score • Will obtain an observed score that is either systematically higher or lower than the “true” score

  47. “Bias” or “Systematic Error”? • Bias implies that the direction of error known • Systematic error – direction neutral • Same error applies to entire sample

  48. Sources of “Bias” in Observed Scores of Individuals • Respondent • Socially desirable responding • Acquiescent response bias • Cultural beliefs (e.g., not reporting distress) • Halo affects • Observer • Belief that respondent is ill • Instrument

  49. Socially Desirable Responding • Tendency to respond in socially desirable ways to present oneself favorably • Observed score is consistently lower or higher than true score in the direction of a more socially acceptable score

  50. Socially Desirable Response Set – Looking “good” • After coming up with an answer to a question, respondent “screens” the answer • Will this make the person like me less? • May “edit” their answer to be more desirable • Example: a woman has 2 drinks of alcohol a day, but responds that she drinks a few times a week • Systematic underreporting of “risk” behavior

More Related