330 likes | 470 Views
Working Paper No.10 21 November 2005 STATISTICAL COMMISSION and STATISTICAL OFFICE OF THE UN ECONOMIC COMMISSION FOR EUROPEAN COMMUNITIES EUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTH STATISTICIANS ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting
E N D
Working Paper No.10 21 November 2005 STATISTICAL COMMISSION and STATISTICAL OFFICE OF THE UN ECONOMIC COMMISSION FOR EUROPEAN COMMUNITIES EUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTH STATISTICIANS ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting on the Measurement of Health Status (Budapest, Hungary, 14-16 November 2005) Session 5 – Invited paper Can secondary analysis teach us on best practice of universal QoL measurementArguments and (some) Evidence Prof. Gouke J Bonsel MPH MD PhD Public Health MethodsObstetrics Academic Medical Centre - University of Amsterdam
Agenda • Comparative Secondary analysis: wanted? • Goals of Measurement • Contents • Process • C2A • Quantitative - Validity • Qualitative - Q/D Vignette • Quantitative - Coverage/Refinement general belief: many issues can be resolved by data
Comparative secondary analysis (C2A) • >2 crude datasets with • known questionnaire + codification rules • known population (at least vs. general) • sharing > 1 intended concept • sufficient common question/response types • sufficient language commonalities • special cases • 1 questionnaire, n populations • n questionnaires, > 1 populations
Comparative secondary analysis : types • quantitative, analytical content-driven methods; with and w/o external criterion • quantitative, descriptive (technical) performance methods • qualitative, semantics • qualitative comparison response form, other operational features all head-to-head analysis will assume some aspects to be constant over the units to be compared
Goals of QoL measurementCONTENTS • Intrinsic goals of health systemsWHO (+EU?) • Health (DALE-like; class) Level Distribution • Responsiveness Level Distribution • Fairness of financing DistributionWashington • Monitoring health population [Health Level] • Care provision [Responsiveness+ Level] • Equal pursuit [Health+Responsiveness Distribution] • External goals (GJB) • Employment, autonomy, reproduction
Goals of QoL measurementCONTENTS • Health State measurement (per domain) • multi-item classical test Q (mQ): no • ordinal classification (class): yes • cf. ItemResponseTheory calibrated : perhaps • Suitability for index development • in general : perhaps • to compose QALY/DALY estimates : yes (but do not tell) • Projection from mission WHO; to existing instruments and accepted classifications
Goals of QoL measurementPROCESS • Efficient Elaboration • Reliable Elaboration • Universality of acceptance • Flexibility of mode of administration • Low price, low burden • Fancy appearance
Some remarks (1) • Domains • normal is absence of dys[...]. avoid ‘better than normal’ discussion (concept: health is positive, item: happy instead of downhearted). think of playing music: there is no better than playing on the beat • from ALL external criteria, except ease of measurement and peace of mind follows about equal space for physical versus psychological domains; less (not absent) for social • projection WHO mission, WHO classifications, other instruments: ex post or ex ante • take care for conceptual unidimensionality artefact and the interpretation of empirical correlation as redundanceclassification nor IRT ‘require’ empirical independence
Some remarks (2) • Domains & Items & Time • (pattern over) time is an essential conceptual component, recall technicalities of minor consideration. • all concepts are continuous over time but some state changes appear as events or episodes or chronic states, or can only defined on (restricted) activity (=event) base hence frequency and intensity to some extent are semantic convention • consequences: • time can emerge in pre-ambule, item, and response. uniformity over the questionnaire essential. people ignore pre-ambules • empirical (pattern over) time therefore decides on ‘frequency’ or ‘intensity’, but on average both are relevant • experience tells that virtually all domains have day-to-day fluctuations, if unstandardized response is during best condition • graphical tools useful if unidmensional item, sofar academic
Some remarks (3) • Items / Response • burden of 3 domains * 6 responses smaller than 6 domains * 3 responses • distributional economy ignored; 2 levels is not best, subjective scale experience does not apply; filtering assumes errorless contextfree threshold judgment. Shannon’s methodology • equilizing in semantics across young/old, man/women, rich/poor, nationality or culture standardizes rather than exposes desired? differences • contextual aspects often ignored; also suitability for translation • reliability information (across time, observers, mode of administration) scarce
C2A: Quantitative Head-to-head Validity • With external criterion • domain specific consequences or etiology and personal chars with prespecified relation. strength of association (preferably RR) • examples • psychological domain - use of specific care, suicide; preceding life events • mobility domain - use of physiotherapy, aids; fracture preceding period • cognitive domain - age
C2A: Quantitative Head-to-head Validity • Without external criterion • domain relations. prespecified patterns. strongly dependent on population (random if about healthy). comparison difficult if scale type differs (mQ, class, IRT) • special case if measure is contained as anchor • ex. • psychological domains vs. physical domains • all domains vs. HUI-Ambulation or EQ-Mobility
C2A: Quantitative Head-to-head Validity • Without internal cutpoint calibration information • Domainwise IRT analysis • With internal cutpoint calibration information (vignettes) • Domainwise CHOPIT like analysis calibration: difficult but essential ALSO in countries
C2A: Qualitative Head-to-head • Suitability to compose vignettes (timeless states, annual profiles) to arrive at Q/D values • self-reflective domain terms • linguistic (non-numerical), objective response mode • clearcut time aspect • across domains ‘uniformity’ of terms, categories and time
C2A: Quantitative Head-to-headEfficiency • Source: investigations supporting increase of levels of EQ5D3L (‘HUI-fication’) • No methods available to demonstrate benefit of more refinement • Method: Shannon’s informativity measure = non-parametric (desirable) quantifier. Source US study http://www.ahrq.gov/rice/ and Med Care 2005;43:203-20&221-28
C2A EXAMPLEEQ-5D, HUI2 and HUI3 dimensions with # levels and # unique permutations defined by full descriptive system. Common Dimensions are Grey
Absolute and % distribution of responses EQ-5D, HUI2 & HUI3 (N = 3691) From the number of potential categoriesand observed frequencieswe can compute Shannon numbers The more equally distributed the more info the better reliability the better sensitivity
H’ and J’ with skewed and rectangular distributions in 3 level vs. 5 level system Shannon numbers are cardinal
H’ and J’ with skewed and rectangular distributions in 3 level vs. 5 level system If system extended but potential categoriesare not occupiedthen absolute Shannon H same relative Shannon J lower
Shannon’s Absolute Index (H’) and Evenness Index (J’) for the Common Domains of EQ-5D, HUI2 & HUI3.
ConclusionsC2A Efficiency by Shannon • Head-to-head comparison tools allows choices on information gain by extension or recalibration • Non-parametrically = advantage as independent from cutpoint (re)estimation • In healthy or ambulatory diseased populations EQ5D3L equals HUI’s for common domains • To be combined with differential cutpoint evaluation and reliability ! straightforwardly applicable for C2A to WHO/EU data if similar population or experimentation
Reliability • Systematic info to select item/respons • domain^respons * time (retest) • domain^respons * respondent (interobserver) • domain^respons * administration (retest) • EQ5D: 3, 4 or 5 • experiment on representative panel under controlled conditions comparing 3L - 5L - RS • error, ‘filling the space’ and reliability
The task: Classify/Rate ‘Self’ and Disease vignettes ? = Response = 3L, 5L, or horizontal unanchored VAS
Inconstencies between 3L and 5L responses by dimension, all 15 health vignettes (N = 82) 3L to 5L no error increase
Inter-observer reliability 3L vs 5L, 15 vignettes5L much better !
Test-retest reliability for respondents’ own health (3 wk interval) with ICC: 5L best !
Aaverage 3Lrs, 5Lrs and RS mean values by dimension, all diseases and self-reported health. 3L and 5L values are transformed (linear) to RS scale range (0-100)
Indirect and direct quantification of levels terms (n = 1230)Midway = 1/3 rate rule
Shannon’s index (H’) and Shannon’s Evenness index (J’) values for 3L and 5L. Comparison by dimension
Conclusions C2A Reliability of reponse terms • Balance of 3 vs. 5 in favour of 5(after WHO-choice) • error increase low • reliability better • Shannon rises (much) • Fairly easy to investigate if great # of respondents • C2A if multiple respons formats for 1 domain
C2A of other process goals • Universality of acceptance • quantitative and qualitative C2A depending on codes for non-respons • Flexibility of mode of administration • qualitative comparison only • Fancy appearance • qualitative comparison only • Low price, low burden • quantitatively possible but who cares?
Recommendations • Comprehensive checklist for C2A • starting from structured agreed contents goals and process/technical goals • distinguishing between quantitative (incl Shannon) and qualitative research and what remains ! • specify models, techniques and success • DATA can SOLVE debatesINTERESTING CHOICES remain