1 / 41

Understanding the Validity and Variability of Back Pain Assessments

Explore the importance of observer variability in back pain assessments, discussing reliability, validity, and measurement methods. Learn how to reduce discrepancies for more accurate results in clinical practice and research.

mwhittle
Download Presentation

Understanding the Validity and Variability of Back Pain Assessments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE FOLLOWING LECTUREHAS BEEN APPROVED FOR ALL STUDENTS BY BIRMINGHAM CITY UNIVERSITY health.bcu.ac.uk/craigjackson This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging Any issues raised in the lecture may require the viewer to engage in further thought, insight, reflection or critical evaluation

  2. Validity & Variability of Back Pain Assessments Dr. Craig Jackson Senior Lecturer in Psychology School of Psychology Faculty of Education Law &Social Science BCU

  3. Who is Observing What? The validity of any observation depends upon who is observing whom Heisenberg’s uncertainty principle (1927)

  4. Content Assessment Criteria Validity Reliability Low Back Pain Assessments Appropriateness & Feasibility Between-Observer Variability and Consistency The Future: Mathematical Models? Validity without Psychology?

  5. Variability Specificity of Defined Field + Repeatable Measurement = Valid Measures S R = V 100 Problem of between-observer variation remains GP eliciting signs in respiratory disease Neurologist evaluating diagnosis of multiple sclerosis Geriatrician assessing stroke rehab. Anaesthetist determining fitness for operation 1. Judgements might be made differently by other observers 2. Judgements might be made differently by same on repeated occasions

  6. Between-Observer Variability Variation between observers Seriously compromise research / clinical findings Worst example: Patients with condition A - all examined by Dr X Patients with condition B - all examined by Dr Y One observer examine all patients ? Not possible / practical

  7. Examples of Between-Observer Variability Diagnostic classification for multiple sclerosis for 149 patients By two clinicians (observers) diagnostic Neurologist B class Certain 38 5 0 1 44 0.30 Probable 33 11 3 0 47 0.32 Possible 10 14 5 6 35 0.23 Doubtful 3 7 3 10 23 0.15 Total 84 37 11 17 149 Proportion 0.56 0.25 0.07 0.11 Proportion Possible Probable Doubtful Certain Total Neurologist A

  8. Examples of Between-Observer Variability Circum-corneal hyperaemia (scored 0,1,2,3,4) by four ophthalmologists Patient 1 2 3 4 5 6 7 8 9 A 3 2 2 2 2 2 1 2 2 B 3 3 2 2 3 2 2 2 2 C 4 4 3 3 4 4 2 3 3 D 3 3 1 2 2 2 1 3 3 Ophthalmologist • Systematic error by observer C - consistently higher • Observer B sticks to mid-ranges • No patient on whom there is total agreement

  9. Examples of Between-Observer Variability Iris hyperaemia (scored 0,1,2,3,4) by four ophthalmologists Patient 1 2 3 4 5 6 7 8 9 A 1 0 0 0 0 3 1 2 0 B 1 0 1 4 1 1 1 1 4 C 4 0 0 0 0 4 0 4 9 D 3 3 1 1 2 2 1 2 2 Ophthalmologist • Observer C uses only extremes of scale • Observer C introduces spurious code • Observer D avoids extreme codes • Only 2 cases with difference of 1 between highest and lowest scores

  10. Reducing Between-Observer Variability Use expert panel / reference library – they evaluate all procedures Compare rival observation methods in small pilot studies Suspect observer at all times - how may s/he be biased? Train observers / assessors Standardised techniques & judgement criteria Consider severity of disagreements Randomise patients out to multiple observers / multiple observations Appoint external assessor

  11. Assessment Criteria With any assessment – observation, questionnaire or equipment – we ask: Utility Is it useful? Reliability Is it dependable? Validity Does it do what it is supposed to? Sensitivity Can it identify patients with a condition? Specificity Can it identify those that do not have the condition? Responsiveness Can it measure differences over time?

  12. Purpose of Assessment Instruments Is the purpose of the instrument clearly stated? Is it discriminative? Is it evaluative? Is it prognostic? Which population is it appropriate for? Clinical Working Research / Epidemiological

  13. Reliability & Validity Validity The degree to which an instrument measures what it is intended to measure Reliability is a necessary but insufficient condition for validity The approximate truth about inferences regarding causal relationships Reliability A degree of consistency of a measure The degree to which a test is free of random error A measure that produces consistent results is said to have high reliability

  14. Validity in Research

  15. Validity Poor repeatability of examination implies a poor validity How repeatable are results by the same observer: On two (or more) occasions by same observer? (Temporal Stability) Or repeated occasions by different observers? Applies equally to Clinical Practice and Research A clinical sign carries no info if it is assessed differently when re-examined

  16. Measures for Clinical Use • Questionnaires • General health status • Pain • Functional status • Patient satisfaction • Physiological outcomes • Utilization measures • Cost measures • Mathematical Modelling

  17. Face Validity Are items measured in a sensible way ? How specific are the questions ? Do questions have a specific time frame / frame of reference ? Are questions performance related ? (do you do it?) Are questions capacity related ? (can you do it?) How is the index scored ? Weighting of items ?

  18. Content Validity Content validity is concerned with “representativeness” Are all relevant dimensions of functionality included ? Subjective Was method for choosing items appropriate ? Draw an inference from test scores to a larger domain of functionality e.g. the abilities covered by the test items should be representative to the larger domain of abilities and function

  19. Construct Validity What is the bigger concept that the assessment is trying to measure? “Theoretical Construct” Does assessment perform satisfactorily when compared with other measures Is that concept a real one? e.g. does specific local pain prevent general functioning? Measured by correlation between the intended independent variable (back health) and a proxy independent variable (specific test performance) that is actually used

  20. Construct Validity For example: Company physician wants to study the relationship between general back health and job performance However, the physician may not be able to administer a comprehensive back health test to every worker In this case, s/he can use a proxy variable such as “performance on a specific functional test" as an indirect indicator of back health Administer the proxy test AND comprehensive back test to a portion of Workers If finding a strong correlation between general back health and the specific test, the proxy test can be used with the larger group because its construct validity is established

  21. Criterion Validity Drawing an inference from specific test scores to general performance Criterion validity is about prediction rather than explanation. Prediction is concerned with non-casual dependence Explanation is pertaining to causal or logical dependence E.g. one can predict the weather based on the height of mercury inside a Barometer. However, one cannot use the behaviour of mercury height to explain why the weather changes.

  22. Responsive to Change Is measure sensitive enough to detect clinically relevant change? Essential for evaluative measurements

  23. Examples: Pain Perception • Visual Analogue Scales Reliable and Valid (Jensen & Karoly 1993) • Advantages over other pain assessment methods • (Scott & Huskisson 1976, Price et al. 1994) • Quadruple Visual Analogue Scales – 4 specific factors – Von Korff et al. 1992 • CURRENT Pain Level • AVERAGE or TYPICAL Pain Level Ratings are averaged • x 10 • Pain level at its BEST = TOTAL SCORE • (Range 0 – 100) • Pain level at its WORST

  24. Condition-Specific Assessment – Low Back Pain • 40+ low back functional questionnaires exist • 5 identified as “gold standard” (Kopec & Esdaile, 1995) • 1. Sickness Impact Profile (Bergner et al. 1981) • 2. Roland-Morris Disability Questionnaire (Roland and Morris, 1983) • 3. Oswestry Low Back Pain Disability Questionnaire (Fairbank et al. 1980) • 4. Million Visual Analogue Scale (Million et al. 1982) • 5. Waddell Disability Index (Waddell, 1984)

  25. Condition-Specific Assessment – Low Back Pain • 2 of the “gold standards” (Kopec & Esdaile, 1995) • 1. Roland-Morris Disability Questionnaire (Roland and Morris, 1983) • 2. Oswestry Low Back Pain Disability Questionnaire (Fairbank et al. 1980) • + Quebec Back Pain Disability Scale (Kopec et al. 1995)

  26. Roland Morris Disability Questionnaire (RMQ) Purpose: Acute and Chronic population of low back pain sufferers An evaluative measure in clinical trials Face Validity: + 24 Yes No questions + Moderate specificity + Today is the frame of reference + Performance related + Double negatives + “Yes response” scores – score out of 24 Content Validity: Mobility Dressing / grooming Work Standing Sleeping Mood Recreation Appetite

  27. Roland Morris Disability Questionnaire (RMQ) • “The best single study of assessing short-term outcomes of primary care • patients with low back pain“ (Von Korff & Saunders, 1996) • Scores > than 13 = Significant disability associated with an unfavorable outcome • (Von Korff & Saunders, 1996) • Any change of less than 4 points is both too small to matter and too small to • be reliable (Stratford et al. 1996)

  28. Oswestry Disability Questionnaire (revised) Purpose: Acute and Chronic population of low back pain sufferers Discriminate between chronic and acute low back pain An evaluative measure in clinical trials Used to predict different rates of improvement Face Validity: + Measured 0 – 5 by degree of difficulty + Very specific questions + No specific frame of reference + Capacity related questions + Score by summing all items = percentage score Content Validity: Pain intensity Personal care Lifting Walking Sitting Standing Sleeping Sex / social life Travelling

  29. Oswestry Disability Questionnaire (revised) Content Validity: Omits: bending kneeling twisting turning emotional state sudden movement “Sex life” reduced response rates (Hudson-Cook et al. 1989) Scoring issues: 11% is a cut off score (Erhard et al. 1994) 00 - 20% Minimal Disability 20 - 40% Moderate Disability 40 - 60% Severe Disability 60 - 80% Crippled 80 - 100% Bed Bound or Exaggerating Stratford et al. 1988

  30. Quebec Back Pain Disability Scale Purpose: Acute and Chronic population of low back pain sufferers Assess level of functional disability Designed as discriminative, evaluative and predictive Face Validity: + Response on rating scale 0 - 5 + Very specific questions + “Today” as frame of reference + Performance related questions + Score by summing all items = percentage score Content Validity: Mobility Travelling Sleeping Sitting Standing Running Lifting Bending

  31. Quebec Back Pain Disability Scale Content Validity: Omits: twisting turning emotional state sudden movement sex life

  32. Reliability Has test-retest reliability been established ? Measure reproducible on repeated use on stable patient ? Internal consistency ? Do items correlate with others ? Alpha (reliability score) lowest highest Roland-Morris Disability Questionnaire 0.89 - 0.93 Oswestry Disability Questionnaire 0.77 - 0.93 Waddell Disability Index 0.76 Quebec Back Pain Disability Scale 0.95

  33. Back Performance Scale (BPS) 5 Tests of sagittal-plane mobility A) Sock test B) Pick up test C) Fingertip-to-floor test D) Roll-up test E) Lift test Sum scores to obtain performance measure of mobility-related activities Objectives: Develop a sum scale Discriminative ability Sensitivite to change Strand et al. 2002

  34. Back Performance Scale (BPS) – Evaluation of . . . Correlations among 5 tests of sagittal-plane mobility: Correlations among 5 tests and BPS total: Cronbach Alpha (reliability): Sum Scores Discrimination: Responsiveness:

  35. Back Performance Scale (BPS) – Evaluation of . . . Correlations among 5 tests of sagittal-plane mobility: Ranged from: 0.27 – 0.50 Correlations among 5 tests and BPS total: Ranged from: 0.63 – 0.73 Cronbach Alpha (reliability): Achieved: 0.73 Sum Scores Discrimination: Higher scores in patients not returning to work Higher scores in patients with back pain rather than MSD Responsiveness: Effect size high (1.33) for patients who returned to work Effect size low (0.31) for patients who had not returned to work

  36. Back Performance Scale (BPS) – Evaluation of . . . 1. BPS sum more responsive than separate tests 2. Measures aspects of performance of clinical importance to back pain 3. Quick, simple and cheap to administer 4. No costly equipment Future research: Could tests with lateral bending and twisting be added? Could twisting / lateral tests replace any of the sagittal bending tests?

  37. Yellow Flags of Low Back Pain • Indicative of long term chronicity and disability • Negative attitude – back pain is harmful and disabling • Fear avoidance • Reduced activity • Expects passive treatment to be better than active treatment • Tendency to low morale, depression and social withdrawal • Social / Financial problems • Should these psychosocial aspects be included in assessment scale? • What validity does any scale have when omitting these constructs?

  38. Appropriateness / Feasibility Is administration format suitable ? Time take to complete questionnaire appropriate ? Questions easy to understand ? Questions acceptable to patient ? Clinical relevance ?

  39. Mathematical Models Leg length differences and MSDs Two measurement methods 1) Direct measurement / observation MRI Ultrasonics 2) Regression equations Early stages Not cost-effective Complimentary at present Physiologically valid Requires physiological uniformity Not valid with clinical populations Ashford & Marlbrook, 2003

  40. Summary of Reliability & Validity • There can be validity without reliability • Reliability is an aspect of construct validity - as assessment becomes less • standardized, distinctions between reliability and validity blur • In many situations assessors are not trained to agree on a common set of • criteria and standards • Inconsistency in performance across tasks does not invalidate the • assessment • Rather it becomes an empirical puzzle to be solved by searching for a more • comprehensive interpretation • Initial disagreement does not invalidate any assessment - provides impetus • for dialog • Moss, 1994

  41. Implications for Back Pain Assessments • Development of 1 single valid universal test may be pointless • No grand Unifying Theory of measurements • If something is easy to measure validly, • it would’ve been done by now • Functional assessments seem alive and well • (for now) • Functional assessments must develop and • include psychosocial aspects

More Related