270 likes | 488 Views
Outcome Measures. in Psychiatric Research. Vaughan Bell. School of Psychology, Cardiff University. Why bother ?. The theory can be complex, the subject matter dry and the work hard. But, knowing how to create and assess measures opens up an important skill set. Critically assess research
E N D
Outcome Measures in Psychiatric Research Vaughan Bell School of Psychology, Cardiff University
Why bother ? • The theory can be complex, the subject matter dry and the work hard. • But, knowing how to create and assess measures opens up an important skill set. • Critically assess research • Assess the way patients (or you!) are measured • Design and implement your own research • Deploy / assess empirical methods in patient care
Outline • Part1 • What is outcome ? • Types of measure • What attributesdoes a good measure need ? • Possible confounding factors and caveats. • Ethical issues • Part 2 • Workshop on creating / assessing measures
What is Outcome ? • In the schizophrenia literature (Brekke et al, 1993), outcome is classified in three ways: • Clinical outcome: signs, symptoms, service utilisation. • Functional outcome: social, vocational, independent living. • Subjective outcome: patient’s experiences of outcome. • It is important to be clear on what sort of outcome you want to assess and why.
Types of Measure • Clinical metrics: length of hospital stay, readmissions, use of PRN medication etc etc. • Psychometric scales: quantifies some aspects of psychological function or behaviour. • Self-report • Structured / semi-structured interview • Multi-rater measures: multiple people are asked to rate the same material – perhaps with one of the methods above. • Tasks: e.g. experimental / neuropsychogical tests
Purpose • Screening tool: Determines whether a symptom or psychological trait is present. • Aid to diagnosis: typically increases the reliability of ‘bedside’ diagnoses. • Quantification: allows symptoms or traits to be quantified by intensity, duration etc. • Dimensional scale: measures an attitude or trait through its range in the population.
Attributes of a Good Measure • The two essential features of a good measure, are: • Reliability: measures consistently. • Validity: measures what it is designed for. • Each of these have various sub-categories that need to be fulfilled.
Reliability • Reliability must be established first, as validity relies on it. • i.e. A reliable scale could measure nonsense, but do so consistently… • …but it is impossible for a measure to be inconsistent and measure what it is designed for.
Reliability • Common forms… • For psychometric scales: • Internal reliability – “are similar items answered in similar ways ?” • Test-retest reliability – “does the test produce similar results when used on the same people on different occasions ?” • For multi-rater observational measures: • Inter-rater reliability – “does the measure produce similar results when used by different observers”
Internal Reliability “Are similar items answered in similar ways ?” • Typically tested with Cronbach’s Alpha. • An alpha above 0.7 is usually considered satisfactory (Kline, 1993) • If a measure has multiple independent factors, they may need to be tested separately.
Test-Retest Reliability “Does the test produce similar results when used with the same people on different occasions ?” • Typically tested with Pearson correlation. • Results from first occasion are correlated with results from second occasion. • Correlation should be above 0.8 (Kline, 1993) • Assumes that the object of measure is stable between occasions. • Therefore, this is usually tested on non-clinical groups.
Inter-rater Reliability “Does the measure produce similar results when used by different observers ?” • Typically tested with Cohen’s Kappa. • Cohen’s Kappa controls for problems with directly comparing multiple raters’ scores. • e.g. one rater consistently scoring five-points more than the other will correlate despite not agreeing. • A two point rating (e.g. symptom present / absent) will have 25% agreement just by chance.
Cohen’s Kappa • Kappa values and level of agreement between raters (Landis and Koch, 1977): • Fair: 0.21 - 0.40 • Moderate: 0.41 - 0.60 • Substantial: 0.61 - 0.80 • Almost perfect: 0.81 - 1.00
Validity • Face validity – “does the perception of the measure influence the outcome ?” • Content validity – “does the measure cover everything it needs to cover ?” • Construct validity – “do the results of the measure agree with what theory predicts ?” • Criterion validity – “does the measure fulfil expected criteria ?” – usually the performance of a certain group • Incremental validity – “does it measure anything new ?”
Face Validity “does the perception of the measure influence the outcome ?” • If participants or testers misperceive the nature of the measure, it may affect the results. • e.g. if someone takes a verbal memory test but thinks it is a creative thinking test, it may look like they are confabulating. • Similarly, asking ‘are you depressed ?’ may have good face validity for depression • …but asking ‘are you deluded ?’ has poor face validity for delusions.
Content Validity “does the measure cover everything it needs to cover ?” • i.e. it is comprehensive ? • If a measure of anxiety asks only about social anxiety, it doesn’t have good content validity. • This is usually assessed by comparing, or generating the measure, based on • Known phenomena • Literature reviews
Construct Validity “do the results of the measure agree with what theory predicts ?” • There are two ways of assessing this: • Convergence – correlates with measures of things known to be associated • Divergence – negatively correlates with things known to be mutually exclusive. • e.g. a good measure of depression should correlate with a measure of low mood • …but negatively correlate with a measure of self-esteem.
Construct Validity Or, a good measure or anxiety should predict performance, as per the Yerkes-Dodson Law (1908)
Criterion Validity “does the measure fulfil certain criteria ?” • Often, this can be the same as construct validity (e.g. correlates with similar measures) • It is often tested by asking members of a certain group to take the test… • …who are known to have high levels of the measure being attributed. • e.g. people with psychosis should score higher than the general population on a good measure of anomalous perceptual experience.
Incremental Validity “does it measure anything new ?” • An assessment of what the measure adds to the ‘toolkit’ of psychological assessment. • If it measures exactly the same as something else, in the same way… • …there may not be any point in developing it.
Wider Validity Issues • Rarely tackled in the textbooks, but it is important to assess how the validity tests were carried out. • Has validity been established for: • different cultural groups ? • different ages ? • someone with a disability ? • etc.
Other Ongoing Changes • Particularly in the clinical environment, there may be a number of difficult-to-disentangle influences. • Children present a particular challenge as it might be difficult to separate the effects of: • Cognitive development • Psychopathology • Treatment • Task engagement / motivation
Covariates • One way of controlling for this is to use covariates in statistical analysis, particularly ANOVA. • e.g. I want to see if my Special Brain Power Training™ boosts children’s intelligence. • I gave class A the training and see if they score more highly on an IQ test than class B. • However, class A are, on average, older, so they are likely to do better anyway. • I introduce age as a covariate into the analysis, to cancel-out its effects and make a fairer comparison.
Confounding Factors • Other factors can sometimes be more difficult to deal with: • Floor / ceiling effects: Where the test is so hard or easy that it is not possible to differentiate between participants. • Therefore, it is important to pilot the measure, and compare performance with norms. • Emotional impact: Testing can be stressful for healthy individuals. • For patients, especially so, particularly if they see themselves as ‘failing’ the tests.
Ethical Issues • There are distinct ethical (and potentially legal) implications when using such tests, e.g.: • Are you using the measure for clinical decisions ? • If so, are you qualified and competent to develop / deploy / interpret the measure ? • If not, do you have adequate supervision from someone who is ?
Ethical Issues • Is the measure being used for research? • If so, has the research been given ethical approval? • Are you asking for informed consent ? • Does the patient know this is not part of their standard care ? • Are the results being kept private or anonymised ?
Conclusions • The purpose of measurement and the type of measure are crucial. • Measures need to be reliable and valid. • You need to be aware of possible confounding factors. • You need to be clear whether the context is research or clinical… • …and know and abide by the ethics of each.