170 likes | 321 Views
Working Paper No.13 21 November 2005 STATISTICAL COMMISSION and STATISTICAL OFFICE OF THE UN ECONOMIC COMMISSION FOR EUROPEAN COMMUNITIES EUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTH STATISTICIANS ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting
E N D
Working Paper No.13 21 November 2005 STATISTICAL COMMISSION and STATISTICAL OFFICE OF THE UN ECONOMIC COMMISSION FOR EUROPEAN COMMUNITIES EUROPE (EUROSTAT) CONFERENCE OF EUROPEAN WORLD HEALTH STATISTICIANS ORGANIZATION (WHO) Joint UNECE/WHO/Eurostat Meeting on the Measurement of Health Status (Budapest, Hungary, 14-16 November 2005) Session 3 Comments on WP # 3. Discussant: Ian McDowell, University of Ottawa, Canada
Descriptive: Broad ranging. Goal = to classify groups Themes of interest to people in general (“quality of life”, etc); issues of public concern To debate: Emphasize modifiable themes? To debate: profile rather than index? Evaluative: Content tailored to intervention; usually not comprehensive Needs to be sensitive to change produced by particular intervention Focused & fine-grained: select indicators that sample densely from relevant level of severity; unidimensional ? emphasis on summary score Clarify Purpose: Description or evaluation? Design implications of each… Discussion point: does proposed instrument need to serve as an evaluative measure?
Purpose, Performance and Capacity Analytic purposes Descriptive purposes Potential Unmet needs Capacity(with any aids) Environment Needsthat have been met Currentpicture Performance Capacity (without aids)
Parsimony, Sensitivity & Specificity These are in tension! Need for brevity implies: • If goal is to have broad coverage of domains (descriptive measure), there can only be few items in each • To achieve breadth within a domain in few items, we need to use generic items (e.g., the infamous “can you cut your toenails?”) • This can achieve sensitivity as a screen, but at cost of low specificity: cannot classify type of condition • Will also lose interpretability and unidimensionality • Point #38: the WP discussion of physical function illustrates choice between measuring overall, vs. specific functions. Do we care whether it’s knee pain, or muscle weakness, or balance that limits walking ability?
Unidimensionality (point #11) • IRT goal of unidimensionality is hard to apply in many areas of health measurement. Some topics are hierarchical; symptoms of depression (e.g.) are not, so in IRT analyses, depression or anxiety scales often do not meet unidimensionality criterion • Unidimensionality is chiefly important for clinical interpretation & maybe evaluation; not the issue here. Surveys focus on how bad it is, not what it is • If instrument will be scored as an index, the issue of unidimensionality becomes irrelevant as all the items are combined and it’s impossible to visualize the person’s disability anyway • There is an inherent tension between using generic, screening-type items (e.g., IADLs) and unidimensionality • Many functions involve more than one body system (e.g., recognizing a face across street), so are not unidimensional
The Time Frame Debate • WP 1 says “present”; WP 3 much broader (& varied) • If sample is large, could use “yesterday” to get prevalence, but will not tell incidence, or duration of condition • Duration requires additional questions, as does change • Width of time window not very important: average is just calculated over a shorter or longer time • Suggest one week (to capture week-ends, etc) or else “yesterday” (as today is incomplete) Sampling window Problem! A B C Change only captured if additional questions asked,so can’t distinguish A from B
Time Window & Response Shift • (Point #13) Larger time windows, and phrasing in terms of “usual” can face issue of response shift (recalibration of person’s view of what is “normal”) • “Usual” phrasing seems most problematic: may miss chronic disabilities (cf. criticism of GHQ); cannot record incidence, maybe not even prevalence Response Shift: Perception of “usual” function Actual trajectory Typical delay varies according to a range of factors
Continuous States vs. Episodic Events • Mobility limitations often endure. By contrast, pain, anxiety or marital disputes are commonly episodic • Averaging over broad time-window can be an issue for the episodic events (point #15), because • Averaging episodes raises issue of frequency vs. intensity of events (see next slide) • In general, time & averaging is less of an issue for capacity than for performance, because capacity is enduring, performance may fluctuate • However, the notion of capacity is hard to apply to pain, anxiety and depression (in which wording a question in capacity terms tends to approximate performance)
Combining Severity & Frequency (e.g., anxiety questions: point 76; pain, point 97) • Risk of trying to do too much. The problem of summarizing frequency & severity grows with increasing length of retrospection. If “yesterday” is used, you need only ask about severity • The term “level” (“How would you describe your level of anxiety?”) is unclear: presumably some combination of severity & frequency of episodes, but how does respondent combine these? • Options. PhD level: “We want you to judge the overall amount of pain, considering both intensity and frequency, you have experienced …”Simpler: “How bad was your pain?” Mild, moderate, severe… versus ? time
Response options: Frequency vs. Difficulty (point # 30) • For chronic conditions, evidently intensity responses are more appropriate • For fluctuating conditions (insomnia, depression), frequency seems most appropriate • If brief recall periods, use intensity responses • For longer-term recall, use frequency • Also, need to decide on relative vs. absolute responses. E.g., “do you have difficulty keeping up with people your own age?” • Likewise, do we specify “level ground” for walking, or “where you live.” The first is close to disability and may not be relevant to them, the second (handicap) will be relevant but may make direct comparisons difficult
Discuss Structure of Overall Instrument • Can it be made dynamic? Item banking; tailored responses; computer administration or using skip patterns. Some examples: • Cella: http://outcomes.cancer.gov/conference/irt/cella_et_al.pdf • www.amIhealthy.com • Ware JE et al. Item banking and the improvement of health status measures. Quality of Life Newsletter 2004; Fall (Special Issue):2-5. • Bjorner JB et al. Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales. Qual Life Res 2003; 12:981-1002
Reference for upper level of function • Best possible function • Compared to your potential • Compared to average person of your age • Without difficulty • To adjust for age or not?
Prosthetics, Analgesics, etc. (points 20-25) Rocks & hard places… • Without aids approximates impairment; with aids = disability • But this distinction is hard to make in ICF: ‘activity’ and ‘participation’ both sound like performance rather than capacity • Not quite clear why eye glasses are singled out for inclusion, while walking sticks apparently are not • Asking an amputee about mobility without his prosthesis seems artificial (point #21) • Likewise, if they are taking effective analgesics, it’s hard for them to report pain without (points #24 & 25) • If purpose is to indicate health states in this nation, suggest the approach of “using any aids you normally use.” • Suggest not relying on use of analgesics as way to indicate severity (point #22), because availability will vary greatly
Visual Analogue Scales • In clinical settings, VAS, NRS pain ratings intercorrelate highly. Verbal scales correlate with both, but less closely • VAS is visual, so implies use of paper & pencil • If used in telephone format, VAS reduces to a NRS, so why not just use NRS? • Less educated and older patients appear to find NRS easier than VAS, so these have been endorsed for use in cancer trials (Moinpour et al., J Natl Cancer Inst 1989; 81:485-495) • The FLIC began with VAS, but changed to 6-pt NRS • However, the VAS can be very responsive (e.g., Hagen et al, J Rheumatol 1999; 26:1474-1480). But do we need responsiveness? • Many alternative formats, including graphic rating scale (Dalton et al, Cancer Nurs 1998; 21:46-49) or box scale (Jensen et al, Clin J Pain 1998; 14:343-349). See also Cella & Perry, Psychol Rep 1986; 59:827-833, and Scott & Huskisson, Pain 1976; 2:175-184.
Anxiety & Depression • Trying to discriminate between these may focus attention on the trees rather than the forest • Unitary theory sees A & D as expressions of the same pathology; the opposing perspective sees them as fundamentally different, while the compromise is to view them as having common roots but different expressions (Brown et al, J Abnorm Psychol 1998; 107:179-192). • Anxiety suggests arousal and an attempt to cope with a situation; depression suggests lack of arousal and withdrawal: the NE and SE quadrants of the diagram (next slide) • An anxious person might say “That terrible event is not my fault but it may happen again, and I may not be able to cope with it but I’ve got to be ready to try.” A depressed person might say “That terrible event may happen again and I won’t be able to cope with it, and it’s probably my fault anyway so there’s really nothing I can do.” (Barlow DH. The nature of anxiety: anxiety, depression, and emotional disorders. In: Rapee RM, Barlow DH, eds. Chronic anxiety: generalized anxiety disorder and mixed anxiety-depression. New York: Guilford, 1991: 1-28)
A circumplex model of affect High positive affect Anxiety active,elated,excited Strong engagement Pleasantness content,happy,satisfied aroused,astonished,concerned High negative affect Low negative affect relaxed,calm, placid distressed, fearful, hostile sad, lonely, withdrawn inactive,still,quiet sluggish,dull,drowsy Disengagement Unpleasantness Low positive affect Depression
Emotions & Affect: scattered thoughts • How to fit affect within capacity / performance distinction? Many anxiety questions use either state or performance wordings (“How severe was you anxiety?” or “Did anxiety limit your daily activities?”) • Why try to distinguish anxiety and depression? • Not completely clear why we need both positive and negative affect (point #68): if time frame correctly chosen, they should not be orthogonal • Phrase such as “upset or distressed” may capture general affect quite well • Stress may also be pertinent: cf. DASS of Lovibond (Manual for the Depression Anxiety Stress Scales. Sydney: Psychology Foundation, 1995)