More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research

More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research John E. Ware, Jr., PhD, Professor and Chief Division of Measurement Sciences, Department of Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA Track A - Patient Reported Outcome Measurement and Comparative Effectiveness Research to Reform: Achieving Health System Change AHRQ 2009 Annual Conference, Bethesda MD September 13-16, 2009

What is the Relationship Between Health Care Expenditures & Outcomes? Health Outcome Expenditures for Health Care ($)

Health Insurance Experiment Revealed: More Health Care is Not Always Better “Flat of the Curve” Health Outcome Expenditures for Health Care ($)

When the Same Outcome Costs More, Payers & Consumers: Want to Pay Less Health Outcome Expenditures for Health Care ($)

Health Insurance Experiment (HIE) (1974-1981) Medical Outcomes Study (MOS) (1986-1990) Health Decline • Most vulnerable in • the MOS: • Chronically ill • Elderly • Poor • Non- white Who is Most Vulnerable with Aggressive Cost Containment? Well Well off Young Cost Containment Expenditures for Health Care ($)

These percentages, better & worsewould be only about 5% due to measurement error 4-Year Physical Health Outcomes Favored FFS > HMO for Chronically-Ill Medicare in the MOS Source: Ware, Bayliss, Rogers et al., JAMA, 1996; 276:1039-1047

When Outcomes Vary at the Same Price Payers & Consumers Want the Best Outcomes Health Outcome Expenditures for Health Care ($)

7 6 5 4 3 2 1 To Compare Health Care EffectivenessWe Need Health Outcomes “Rulers” Better Health Outcome Same Worse Expenditures for Health Care ($)

Continuum of Disease-specific and Generic Health Measures Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Specific Symptoms Clinical Markers (1) (2) (3) (4) Adapted from: Wilson and Cleary, JAMA, 1995 Ware, Annual Rev. Pub. Health, 1995 9

Continuum of Disease-specific and Generic Health Measures Shortness of Breath Over the last 4 weeks I have had shortness of breath Almost every day Several days a week A few days a month Not at all Spirometry Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Specific Symptoms Clinical Markers dd (1) (2) (3) (4) Adapted from: Wilson and Cleary, JAMA, 1995 Ware, Annual Rev. Pub. Health, 1995 10

Continuum of Disease-specific and Generic Health Measures Shortness of Breath Respiratory-specific Spirometry Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Specific Symptoms Clinical Markers dd Over the last 4 weeks I have had shortness of breath Almost every day Several days a week A few days a month Not at all How much did your lung/respiratory problems limit your usual activities or enjoyment of everyday life? Not at all A little Moderately Extremely (1) (2) (3) (4) Adapted from: Wilson and Cleary, JAMA, 1995 Ware, Annual Rev. Pub. Health, 1995 11

Continuum of Disease-specific and Generic Health Measures Shortness of Breath Respiratory-specific Generic Spirometry Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Specific Symptoms Clinical Markers dd Over the last 4 weeks I have had shortness of breath Almost every day Several days a week A few days a month Not at all How much did your lung/respiratory problems limit your usual activities or enjoyment of everyday life? Not at all A little Moderately Extremely In general, would you say your health is… Excellent Very good Good Fair Poor (1) (2) (3) (4) Adapted from: Wilson and Cleary, JAMA, 1995 Ware, Annual Rev. Pub. Health, 1995 12

There is More to the Continuum Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Specific Symptoms Clinical Markers (1) (2) (3) (4)

Prediction and Risk Management: PROs are among the Best Predictors Generic Functioning, Well-being and Evaluation Impact of Disease-specific Problems Future health Inpatient expenditures Outpatient expenditures Job loss Response to treatment Return to work Work productivity Mortality (3) (4) Health-Related QOL (HR-QOL)

What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation Physical activity limitations Symptoms of psychological distress Physical well-being Life satisfaction Emotional behavior Role disability due to physical problems Psychological well-being General health perceptions Physical mobility Role disability due to emotional problems Satisfaction with physical condition Social activities with friends/relatives

SIP = Sickness Impact Profile (1976) HIE = Health Insurance Experiment surveys (1979) NHP = Nottingham Health Profile (1980) QLI = Quality of Life Index (1981) COOP = Dartmouth Function Charts (1987) DUKE = Duke Health Profile (1990) Content of Widely-Used Patient-Reported Outcome Measures Psychometric Utility Related SIP HIE NHP COOP DUKE MOS MOS QWB EURO HUI SF-6D CONCEPTS PROMIS FWBP SF -36 -QOL Physical functioning l l l l l l l l l l l l Social functioning l l l l l l l l l l l Role functioning l l l l l l l l l l l Psychological distress l l l l l l l l l l l Health perceptions (general) l l l l l l Pain (bodily) l l l l l l l l l l Energy/fatigue l l l l l l l l Psychological well-being l l l l l Sleep l l l l Cognitive functioning l l l l Quality of life l l l Reported health transition l l l MOS SF-36 = MOS 36-Item Short-Form Health Survey (1992) = Quality of Well-Being Scale (1973) PROMIS = Patient Reported Outcomes Measurement Information System QWB = Quality of Well-Being Scale (1973) EUROQOL = European Quality of Life Index (1990) HUI = Health Utility Index (1996) MOS FWBP = MOS Functioning and Well-Being Profile (1992) SF-6D = SF-36 Utility Index (Brazier, 2002) Source: Adapted from Ware, 1995

What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation

“Ceiling Effect” What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation

r = 0.938 N = 1016 Criterion Score Dynamic 5-Item Headache Pain Measure A Practical Solution in 1999: Computerized Dynamic Health Assessment IRT/CAT will spawn a new generation of static tools “Ceiling Effect” r = 0.536 N = 1016 Criterion Score 3 SD units No Disability Skewed 5-Item Headache Pain Measure Ware JE, Jr, et al. Med Care. 2000;38:1173-82.

Criterion VAS What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation

What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation

Practical Solution in 2000:Cross-Calibration of Headache Pain Disability Measures Theta (θ) [Best Possible Estimate] Scales 20 30 40 50 60 70 HDI16 43 73 91 98 100 HIMQ 74 53 31 17 8 2 MIDAS 58 28 5 1 0 0 MSQ 31 53 79 92 96 99 DYNHA-5 (+) 23 32 41 51 58 66 Note: Direction of scoring shown with arrows Source: Ware, Bjorner & Kosinski, Medical Care, 2000

We Need the Health Equivalent of a Two-Sided Tape Measure 52 centimeters = 20.5 inches and Public-Private Partnerships That Meet the Needs of Research and Business

What do the results mean? What Do We Need for Comparative Effectiveness Research? • Outcomes that matter to patients • Practical measures • Coverage of a wide range • Greater precision • Comparability of scores • Ease of interpretation

PRO Validation Must be Comprehensive • Diagnosis • Disease severity • Responders • Treatments Measures In Question Gold Standard • Work productivity • Costs of care • Mortality • Self- evaluated health Other Measures & Methods • Diagnosis • Disease severity • Responders • Treatments • Work productivity • Costs of care • Mortality • Self-evaluated health Gold Standard Other Measures & Methods Causes Consequences Adapted from: Ware JE, Jr. and Keller SD: Interpreting general health measures, in: Quality of Life and Pharmacoeonomics in Clinical Trials. Philadelphia, PA: Lippincott-Raven Publishers; 1995: Chapter 47.

What Do Differences in Treatment Effectiveness Mean? Asthma After Rx Asthma Before Rx 50% reduction in disease burden 33% reduction in hospitalization Substantial increase in work productivity Subsequent cost savings Congestive Heart Failure Chronic Lung Disease Diabetes Type II Treatment Average Adult Average Well Adult 30 40 50 Physical Component Summary (PCS)

Matching Methods to Applications:“Choosing the Right Horse for the Course” • Population monitoring • Group-Level outcomes monitoring • Patient-level measurement/management

Matching Methods to Applications Patient-Level Management Group-Level Outcomes Monitoring Population Monitoring 7 7 7 6 6 5 5 5 Noisy Individual Classification Very Accurate Individual Classification 4 4 3 3 3 2 2 Most Functionally Impaired 1 1 1 Single-Item Multi-Item Scale “Item Pool” (CAT Dynamic)

Solutions • Improved psychometrics (Item response theory – IRT) • Computerized adaptive testing (CAT) software • The Internet (and other connectivity) Business Week. November 26, 2001.

Source: Business Week 11/26/01 First, Construct Better Metrics % @ Ceiling: • Comprehensive Item “Pools” • IRT Cross Calibration of Items 2008 “PF Ruler” < 3 % @ Ceiling NEW PF 1990 “PF Ruler” > 30% @ Ceiling PF-10 1980 “PF Ruler” > 75% @ Ceiling + = ADL SIP FIM Physical Functioning (PF)

Precision Varies Across “Static” and Dynamic Forms and Across Score Levels PF-1 (“Static”) PF-2 (“Static”) PF-10 (“Static”) PF CAT-10 PF “Criterion” (Item Bank) Rheumatoid Arthritis 6.0 0.75 5.0 4.0 Standard Error 0.90 3.0 Reliability 0.95 2.0 1.0 0 10 20 30 40 50 60 70 80 Physical Function (PF), Mean = 50 Source: Rose M, Bjorner JB, Becker J, Fries JF and Ware JE. Evaluation of a preliminary physical function item bank supported expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). Journal of Clinical Epidemiology, 2008, 61, 17-33.

2nd Solution, Assess Health Dynamically Patient scores here CAT CAT = Computerized Adaptive Testing

What are the Advantages of Dynamic Assessments? • More accurate risk screening • Reliable enough to monitor individual outcomes • Brevity of a short form – 90% reduction in respondent burden • Elimination of “ceiling” & “floor” effects • Can be administered using various data collection technologies • Markedly reduced data collection costs • Monitor data quality in real time

3rd Solution: The Internet www.asthmacontroltest.com / www.amIhealthy.com Reference – Headache Impact: MS Bayliss, JE Dewey, R Cady etal., A.Study of the Feasibility of Internet Administration of a computerized health survey: The Headache Impact Test (HIT), Quality of Life Research, 2003, 12: 953-961 References – Asthma Control: Nathan RA, Sorkness CA, Kosinski M et al., “Development of the Asthma Control Test: A survey for assessing asthma control. Journal of Allergy and Clinical Immunology. 2004;113: 59-65.

Conclusions • Patient-reported outcomes (PROs) are very useful • Standardization of concepts & metrics is enabling comparisons across treatments & settings • Increasing widespread use proves that more practical tools will be adopted • Promising technological advances include: item response theory (IRT), computerized adaptive testing (CAT) and Internet-based data capture

More Practical Metrics for Standardizing Health Outcomes in Effectiveness Research