620 likes | 774 Views
513-611A STUDY DESIGN AND ANALYSIS I Sept 29, 2003. Development and Validation of a health status measure. Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Direction de santé publique de Montreal-Centre
E N D
513-611A STUDY DESIGN AND ANALYSIS I Sept 29, 2003 Development and Validation of a health status measure Susan Stock, MD MSc FRCPC Institut national de santé publique du Québec Direction de santé publique de Montreal-Centre Dept of Epidemiology, Biostatistics and Occupational Health, McGill
Plan • Types of Health Status Measures • Steps in the development of a health status measure • Steps in the development of the Neck and Upper Limb Index • Steps in the validation of a health status measure • Steps in the validation of the Neck and Upper Limb Index Susan Stock : Developing & Validating Health Status Measures
Health status measure • A health outcome questionnaire that quantifies symptoms, function, feelings and/or behaviour directly from the respondent to measure overall health status (generic instrument) or disorder-specific health status • Vary in scope • Activities of daily living ("ADL”- e.g. self care, mobility) • Functional status – measure capacity or performance of physical functioning, e.g. household tasks, work, recreational activities • Health-related "quality of life" instruments - measure not only physical functioning but also psychological, social and role functioning Susan Stock : Developing & Validating Health Status Measures
Health status measures • allow patient/subject to identify impact of a disorder or health problem on his/her life across many dimensions based on his/her experience rather than the interpretation of a health care professional • Useful in a wide range of studies and clinical contexts: • In studies of aetiology, prevalence and prognostic factors they can be incorporated into case definitions that distinguish according to severity • In intervention studies and health services research they can be used as the primary outcome to demonstrate change over time in health status Susan Stock : Developing & Validating Health Status Measures
Development of Health Status Measures: references • Streiner DL, Norman GR. Health measurement scales. A practical guide to their development and use. Second edition. New York: Oxford University Press, 1995: 28-53 • Guyatt GH, Bombardier C, Tugwell PX. Measuring disease-specific quality of life in clinical trials. CMAJ 1986; 134: 889-895 • Guyatt GH, Jaeschke R, Feeny DH, Patrick DL. Measurement in clinical trials: Choosing the right approach. • Juniper EF, Guyatt GH, Jaeschke R. How to develop and validate a new health-related quality of life instrument. • In Spilker B (ed), Quality of Life and Pharmacoeconomics in Clinical Trials, Second edition. Lippencott-Raven Publishers, Philadelphia, 1996 Susan Stock : Developing & Validating Health Status Measures
Neck and Upper Limb Index (NULI) Health-related quality of life instrument: • specific to neck and upper extremity musculoskeletal disorders • capable of measuring changes within subjects over time in intervention studies • capable of distinguishing between subjects (i.e., assess severity) in prognostic, prevalence or etiologic studies • applicable to both French and English speaking populations in Canada, and • practical and easy to use in clinical settings Susan Stock : Developing & Validating Health Status Measures
Neck and Upper Limb Index (NULI) • In order to develop an instrument that was equally appropriate to the two major cultural and linguistic groups in Canada • Conducted two separate studies with similar protocols for item reduction and selection and subsequent validation • one in an Ontario English-speaking population • the other in a Quebec French-speaking population Susan Stock : Developing & Validating Health Status Measures
Steps in development of a health status measure • Search for appropriate existing measure! If none available: • Identify domains of interest • Generating potential items • Refine items and pre-test • Choose appropriate response scale(s) for the items • Carry out item reduction and item selection strategies Susan Stock : Developing & Validating Health Status Measures
Steps in development of the NULI • Identification of domains of interest • Generation of potential items • Item refinement and pre-testing • English item reduction and selection study • French translation of potential items • French item reduction and selection study • Comparison of English and French results • Selection of 20 items appropriate for both populations • Reliability and validity testing of the final 20-item instrument in both English and French populations Susan Stock : Developing & Validating Health Status Measures
Domain • a dimension of life potentially affected by the disorder or health problem in question • e.g. self care, household responsibilities, work, social life, sexual life, mood, self esteem, transportation, recreation, sleep, financial impact of disorder, iatrogenic effect of evaluation or treatment Susan Stock : Developing & Validating Health Status Measures
Identifying domains & generating items • Strategies for identifying the most appropriate domains of interest and for generating potential items are aimed at optimizing content validity • the extent to which the measurement incorporates all the relevant content or domains of the phenomenon under study Susan Stock : Developing & Validating Health Status Measures
NULI: Identifying domains & generating items • review of relevant literature (rheumatology, rehabilitation, orthopaedics, back pain) • review of existing health status instruments identified by bibliographic search and contact with experts • clinical experience of investigators • survey of 30 clinicians • interviews with 33 worker-patients who presented with neck and/or upper limb disorders in five clinical occupational healthsettings Susan Stock : Developing & Validating Health Status Measures
Evaluating content validity of existing instruments • Identify relevant domains for the concept of interest and evaluate whether instruments measure these domains adequately • Identify number or proportion of items in each instrument that are not relevant to the concept you wish to measure • Ref: Stock SR, Cole DC, Tugwell P. Review of applicability of existing functional status measures to the study of workers with musculoskeletal disorders of the neck and upper limb. Am J Indust Med 1996, 29, 679-688 Susan Stock : Developing & Validating Health Status Measures
An example of evaluation of content validity Distribution of items among domains for selected musculoskeletal functional status instruments Susan Stock : Developing & Validating Health Status Measures
NULI: Identifying domains & generating items • 80 questions in 8 domains identified through investigator clinical experience, existing instruments and literature • 52 additional items and 2 domains generated by clinician survey • 48 additional items and 2 more domains identified by patient interviews • Total of 12 domains identified Susan Stock : Developing & Validating Health Status Measures
Item refinement • Redundant items eliminated • Pool of approximately 150 items with 7-30 items per domain • Wording of items • Literacy editor to ensure Grade 6 language • “Applicability”: Screening question developed for activity-related items to evaluate whether the item was applicable or relevant to the subject (work, household and family responsibilities, transportation/driving, recreation, and social activities; sexual life) • Vacuuming, shovelling snow • Sports activities Susan Stock : Developing & Validating Health Status Measures
Item refinement: Choice of response scale • Response scale: 7-point numbered scale with verbal anchors • Maximize reliability: reliability of a scale rises rapidly as the number of divisions increases to seven and then rises more slowly until there are 11 points (Streiner and Norman 1995, Nunnally et Wilson 1975, Nishisato et Torii 1985 ) Susan Stock : Developing & Validating Health Status Measures
Response scale : number of points on a scale • Loss of test re-test reliability: • 7-10 categories: little reduction of reliability • 5 categories reduces reliability by 12% • 2 categories reduces reliability by 35% • Optimum number of points recommended: (5 to) 7 categories (Reference: Streiner and Norman 1995, Chap 4) • Treating rating scales as interval data statistically will result in less measurement error when there are more items Susan Stock : Developing & Validating Health Status Measures
Scaling: number of points on a scale • Potential sources of error when there are few points on a scale: • Uncertainty, confusion of respondents • Reduction in reliability • Loss of efficiency of the instrument • More subjects needed to show an effect (S Suissa J Clin Epidemiol 1991, 44: 241-8) • Lower correlation with other measures (Hunter & Schmidt 1990, J Applied Psychol 75:334-49) Susan Stock : Developing & Validating Health Status Measures
Pre-test • Pre-test in 10 clients with musculoskeletal disorders of neck or upper extremity in a vocational rehabilitation clinic • To identify questions that are unclear, ambiguous, difficult to understand or inappropriate • Revise items following pre-test Susan Stock : Developing & Validating Health Status Measures
Inter-rater reliability testing • inter-rater reliability study of revised potential items • English study conducted on 38 worker-patients with neck and upper limb disorders in four clinical settings prior to the item selection study; French inter-rater reliability study was conducted with 16 worker-patients • 2 raters interviewed each patient on the same day, at 2-4 hour intervals • Following the second interview, feedback was sought from respondents to further identify any ambiguous items or those difficult to understand • ICC (intraclass correlations) calculated for the mean of items in each domain and for each individual item. • Items with low inter-rater reliability (ICC<0.7) identified and source of difficulty reviewed with the interviewers. • Items were reformulated where indicated. Susan Stock : Developing & Validating Health Status Measures
Interviewer training • 3-5-day training sessions for interviewers • to be familiar with content of questions, use of scales • to teach appropriate standardised technique • interviewers trained to probe in a non-directive, non-biasing fashion, and be interpersonally neutral • feedback on tape-recorded interviews • role-playing of interviews with potentially difficult subjects Susan Stock : Developing & Validating Health Status Measures
Interviewer training • To reduce bias and random error and ensure strict adherence to research protocol • Inform re purpose of study, type of data to be gathered, how results will be used • Familiarize with questionnaire, understand every item • How to handle first meeting with respondent, techniques for building rapport • How to answer questions commonly asked by respondents • Confidentiality procedures • When and how to probe • How to ask questions • How to record responses • Checking the questionnaire • How to end interviews • How to deal with special situations (angry, tearful, or verbose respondents) Susan Stock : Developing & Validating Health Status Measures
Item reduction studies Study procedure: • Pre and post-treatment administration of 170 potential items and validating measures to 119 English-speaking Ontario workers and to 93 French-speaking Quebec workers with neck or upper limb disorders recruited from occupational and physiotherapy clinics • 7-30 specific items in each of the 12 domains including a global question about the overall impact of the disorder on that domain • An additional administration 3-7 days after the initial administration for test re-rest reliability • Subjects rank ordered the 12 domains according to the relative importance of the impact of their musculoskeletal disorder on these dimensions of their lives Susan Stock : Developing & Validating Health Status Measures
NULI Item reduction Objective of item reduction: • To identify and omit items that were irrelevant, unresponsive, had poor test re-test reliability, discriminated poorly or were unresponsive to change Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction • Applicability of activity related items • Eliminate items not applicable to at least 80% study population • Eliminate items not applicable to at least 70% of men and 70% of women • e.g. vacuuming applicable to 49% men 83% women • Shovelling snow not applicable to 82% women • Reproducibility • Eliminate items with Pearson correlation coefficient 0.5 Susan Stock : Developing & Validating Health Status Measures
Criteria for item reduction • Internal consistency • Eliminate items with correlation 0.3 between item score and: (1) mean of all items in the domain without that item; (2) the global question score for the domain • Responsiveness to change • Eliminate items with correlation 0.3 between the residual change scores pre-treatment and post-treatment to the residual change score of the domain Global Score • Discriminative Ability • Eliminate items with a skewness statistic > 2 standard deviations of the standard error of this statistic Susan Stock : Developing & Validating Health Status Measures
Measuring change • Problem with change scores: regression to the mean (tendency of outlying scores to return to the mean) • by chance low pre-test scores will be higher on post-test and high pre-test scores will be lower on post-test) • Possible solution: residual change scores Susan Stock : Developing & Validating Health Status Measures
Selection of final domains • Selection of domains: relative impact and importance study subjects attributed to each domain • mean score of the global question for each domain and • domain rankings • calculated for each study population as well as by gender • committee of co-investigators reviewed these data and, through consensus discussion, arrived at a choice of priority domains and the number of items of each domain the final instrument should include • Selection among remaining items Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for Each Domain between Quebec and Ontario Study Populations 7 Ontario 6 Québec 5 Mean global question score 4 3 2 1 Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1 Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6 Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain between Quebec and Ontario Study Populations 12 Ontario 11 Quebec 10 9 8 7 13 - mean rank 6 5 4 3 2 1 WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7 Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5 Susan Stock : Developing & Validating Health Status Measures
Selection of remaining items • Selection of the most responsive and most discriminating items that covered the priority domains • Number of items that would result in an instrument that takes no more than 5-10 minutes to complete (version 1= 35-items; version 2 = 20 items) • Selection among items with similar responsiveness and discriminative ability were selected based on the clinical judgement of the co-investigator research committee Susan Stock : Developing & Validating Health Status Measures
Translation into French • double reverse parallel translation method (Vallerand 1989) • translation into French of the English questionnaire by two independent translators (versions A and B) • the two French versions (versions A and B) translated into English by two different translators (versions C and D) • versions C and D compared to the original English version by a committee comprised of three bilingual study researchers (two francophones, one anglophone) and discrepancies resolved through consensus to arrive at a revised French translation, version E • version E pre-tested on 16 francophone workers with neck or upper extremity disorders to identify ambiguous or difficult to understand items • results of the pre-test reviewed by the research translation committee and a final French version of the questionnaire was agreed upon (version F). Susan Stock : Developing & Validating Health Status Measures
Criteria for acceptance of a French formulation • meaning of the French version was as close as possible to the English one • the most simple term would be selected (in order to be understandable at a Grade 6 or lower reading level) • French syntax would be respected • the terms most commonly used in current Quebec French would be selected Susan Stock : Developing & Validating Health Status Measures
Comparison of English and French item reduction results • Compare demographic profile of the 2 populations • compare English and French subjects’ mean responses for the global question of each domain by t-test for univariate analyses and multiple regression analyses controlling for sex, age, income and duration of symptoms • compare English and French subjects’ mean ranking scores for each domain by Wilcoxon rank-sum test for univariate analyses and by partial Spearman correlations between the mean ranking score of each domain and the study group status (i.e., English or French study group) controlling for sex, age, income and duration of symptoms Susan Stock : Developing & Validating Health Status Measures
Comparison of Ontario and Quebec study populations The Quebec study population was more likely to be female (p=.02), have had symptoms > 6 months (p=.001), still be at work (p=.02) and less likely to be on WCB benefits (p=0.0001) Susan Stock : Developing & Validating Health Status Measures
Comparison of Global Question Mean Scores for Each Domain between Quebec and Ontario Study Populations 7 Ontario 6 Québec 5 Mean global question score 4 3 2 1 Work Sleep Recr Mood Housework Esteem Self care Financial Driving Sex life Iatrog Social Ontario 5.4 4.9 4 3.9 3.7 3.5 3.3 3 2.9 2.9 2.5 2.1 Québec 5.3 5.2 4.3 3.8 4 4 3.4 2 3 4.2 3.2 2.6 Susan Stock : Developing & Validating Health Status Measures
Comparison of Mean Ranking for Each Domain between Quebec and Ontario Study Populations 12 Ontario 11 Quebec 10 9 8 7 13 - mean rank 6 5 4 3 2 1 WORK HOUSE/F SLEEP MOOD RECR $$$ S/CARE DRIV ESTEEM IATRO SOCIAL SEX Ontario 10.8 8.4 8.1 6.9 6.5 6.5 6.5 5.9 5.7 4.8 4.7 3.7 Quebec 10.3 8.1 8.6 6.3 7.2 4.2 6.7 6.3 5.1 5.9 4.8 4.5 Susan Stock : Developing & Validating Health Status Measures
Comparison of mean rank of each domain between English and French study subjects: univariate analyses Susan Stock : Developing & Validating Health Status Measures
Correlation of study status (English or French) to mean domain ranking controlling for age, gender, income and duration of symptoms Susan Stock : Developing & Validating Health Status Measures
Multiple regression for each domain to assess whether study status (English or French) was a predictor of the mean score of the domain global question when controlling for age, gender, income and duration of symptoms 1 A positive coefficient indicates that French study subjects had significantly higher mean global scores than English subjects for that domain 2 A negative coefficient indicates that English study subjects had significantly higher mean global scores than French subjects for that domain Susan Stock : Developing & Validating Health Status Measures
Synthesis of English-French comparisons • Sexual life: • Statistically significant differences in mean ranking and mean domain global score but clinically insignificant difference in ranking • Domain did not meet applicability criteria • Financial impact/iatrogenic effects: • Statistically significant differences in mean ranking and mean domain global score probably reflecting differences in proportion of subjects off work and differences in clinical treatment program • Overall no major differences in mean domain rankings or mean domain scores or in results of individual item reduction • A single instrument could be developed for both populations Susan Stock : Developing & Validating Health Status Measures
Final instrument • 20 items: • 4 work • 7 physical activities (self care, domestic responsibilities, leisure) • 6 psychosocial (mood, self esteem, social role function) • 2 sleep • 1 iatrogenic Susan Stock : Developing & Validating Health Status Measures
Validation of a health status measure • Internal consistency • Reproducibility (test re-test reliability) • Validity • Content • Criterion or convergent • Construct • Predictive • Responsive to change Susan Stock : Developing & Validating Health Status Measures
Measures of internal consistency • Cronbach alpha (0.0-1.0) • An estimate of the correlation between the total score across a series of items from a rating scale and the total score that would have been obtained had a comparable series of items been employed • Inter-item correlations • Item-total correlations (total ± item) • Correlation of item to mean of items (mean ± item) • Split half reliability (items randomly divided and 2 sub-scales correlated) Susan Stock : Developing & Validating Health Status Measures
Reliability • Test re-test reliability: the stability exhibited when a measurement is repeated under identical conditions • calculation of the intra-class correlation (ICC) for two administrations of the index, 3-7 day apart in 99 Ontario subjects and 33 Quebec subjects • Internal consistency: intercorrelation between items of a scale meant to measure the same concept • Cronbach’s alpha calculated for 119 Ontario subjects and 93 Quebec subjects present at the initial pre-treatment administration of the questionnaires Susan Stock : Developing & Validating Health Status Measures
Ways of improving reproducibility • Increase the number of items in a test or measurement scale • Increase the number of response choices for each item • Reduce inter-observer variation (training of interviewers, standardised protocol) • Reduce ambiguity in questions Susan Stock : Developing & Validating Health Status Measures
Validity • An expression of the degree to which a measurment measures what it purports to measure (Last) • Is the scale measuring what it was intended to measure? Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity? • Subjective judgement by “experts”: • Face validity:the extent to which, on the face of it, the measurement appears to be assessing the desired qualities • Content validity: the extent to which the measurement incorporates all the relevant content or domains of the phenomenon under study Susan Stock : Developing & Validating Health Status Measures
How do we demonstrate validity? • Criterion validity:extent to which the measurement correlates with an external criterion (preferably a "gold standard") Susan Stock : Developing & Validating Health Status Measures