1 / 38

3rd national conference on research on and development of insight into coercion in psychiatry

3rd national conference on research on and development of insight into coercion in psychiatry Holmen fjordhotell 5 april 2011 ”This instrument has been thoroughly validated and has good psychometric properties.” 1) What does that mean?

joycehall
Download Presentation

3rd national conference on research on and development of insight into coercion in psychiatry

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 3rd national conference on research on and development of insight into coercion in psychiatry Holmen fjordhotell 5 april 2011 ”This instrument has been thoroughly validated and has good psychometric properties.” 1) What does that mean? 2) Is Cronbach’s alpha the ultimate validity criterion? Dag Hofoss PhD, Prof Inst of Health and Society, Univ of Oslo Inst of Community Medicine, Univ of Tromsø

  2. Somecheckpoints for psychometricevaluationofquestionnaires Do theitemsreflect (separately and together) sometheoreticalmodel? Didwearrangefocusgroupdiscussionsonpossibleitems? (w/ patients?w/experiencedresearchers?) Wasquestionnairetestedondifferentpopulations, includingonewhichresemblesyours? Is questionnairetoolong? (Indications: numberofpages, response rate) Werequestionsconstructed ”lege artis”? Toomany ”missings”? Not enoughvariance? Test-retestcorr/ICC (> .7?) Do items have Concurrentvalidity? Predictivevalidity? (Abovewhat is provided by theotherquestions?) Interobserverreliability (Cohen’s kappa > .6)? Data at all factorable (”Bartlett’s p < .05”, KMO > .7)? Did EFA providemeaningfulfactors? Do differentextraction/rotationmethodsprovide same factors? Didfactorsreflect (”explain”/”account for”) at least 60 % ofthevariance in the variables? Internalconsistencyoffactors OK (Cronbach’salpha >.7)? Item-to-own-factorcorrelations > item-to-other-factorcorrelations? Questionnairepurgedofquestionswithlowcommunality? Modelconsistentwith data (model’s ”data-tillatelighet”)? (χ2/df < 2.5, p > .05, pclose > .05, AGFI > .9, RMSEA < .05, Hoelter .05 > 200 (etc)) FAQ: ”Is all thisreallynecessary: all wewanted to do wasidentifytheimportant questions, and formulatethemaccording to therulesofwritinggood questions?” A: ”Psychometrics not just aboutwritinggoodquestions”

  3. Reliability • 1) Stability = littlevariation (in scoring ofstable characteristic) from one • measurement to thenext. • Examine by test-retestcorrelation: Pearson’s r (problem: M1 may be perfectlycorrelatedwith M2, butsystematicallyhigher). Therefore: • examineinstead by ICC (IntraclassCorrelationCoefficient). In both cases: > 0,7 • 2) Internalconsistency = All items in compound instrument shouldaddress • same underlyingconcept (differentaspects, butof same dimension). • Examine by Cronbach’salpha (Coefficientalpha) > 0,7 (seebelow) • 3) Equivalence = Interraterreliability (agreementbetweentwo scorers). • Examine by Cohen’s kappa (κ) > 0,60 (seebelow)

  4. Checking the internal consistency of factors (Cronbach’s alpha (α)) α = N*raverage/[1 + raverage(N-1)] N = Number of items raverage = average of correlations between the pairs of variables raverage = (0,77+0,13+0,99+0,78+0,76+0,70)/6 = 4,13/6 = 0,69 α = 4*0,69/[1+0,69(3)] = 2,76/[1+2,07] = 2,76/3,07 = ,90

  5. Checking coder agreement (Cohen’s kappa (κ): perfect agreement = 1) OBS: Here ”unweighted” only = here, large disagreements do not reduce kappa more than small disagreements After 3 mnths 350 low back pain patients – and their doctors – were asked how much did the operation help: ”Very much”, ”Much”, ”A little” eller ”Not at all” Of the patients195+118=313 reported ”(Very) much”, 313/350 = 89 % Of the doctors 204+96=300 reported ”(Very) much”, 300/350 = 86 % But how strong was the doctor-patient agreement?

  6. Full agreement (same answer) in 150+48+14+9 cases = 221 221/350 = .631 But part of this agreement may be accidental (cases of agreement would appear even in the case of no correlation patient-doctor scores). Solution: Adjust for random agreement = calculate Cohen’s kappa (κ) (Large disagreements ought to reduce kappa more than small disagreements – if scorer #1 codes ”1” and coder #2 codes ”4”, disagreement is larger than where coder #1 codes ”1” and coder #2 codes ”3”. Solution: calculate weighted kappa (κw). (Beyond scope of today’s presentation)

  7. κ = strength of coder agreement after adjustment for random agreement. Calculated from pobs and pexp Formula is: κ = (pobs - pexp) / (1 - pexp) pobs = per cent of cases on the agreement table’s diagonal (= cases where patient and doctor passed identical judgments on operation effect) pexp = per cent of cases on the agreement table’s diagonal if both coders coded at random The larger the difference between pobs and pexp, to larger the real agreement between the coders

  8. Formula is: κ = (pobs - pexp) / (1 - pexp) pobs says which fraction was on the diagonal: (150+48+14+9)/350 = .631 Calculation of pexp: next slide

  9. pexp says which fraction would have been on the diagonal in the case of no doctor-patient correlation Diagonal has four cells: ”150”, ”48”, ”14” og ”9”. With random coding, the 150 would have been 58,29 % of 195 = 113,67 With random coding, the 48 would have been 27,43 % of 118 = 32,37 With random coding, the 14 would have been 11,14 % of 26 = 2,90 With random coding, the 9 would have been 3,41 % of 11 = 0,35 Sum 149,29 Pexp = 149,29/350 = 0,427 (with no correlation (& given these marginal distributions), one would have expected to find 42,7 % of the cases on the diagonal)

  10. (Unweighted) kappa, then, is: κ = (pobs-pexp)/(1-pexp) = (0,631-0,427)/(1-0,427) = 0,204/0,573 = 0,356 0,356 – is it a lot or rather little? Altman (p 404) Value of κStrength of agreement >,80 Very good ,61-,80 Good ,41-,60 Moderate (read: too little) ,21-,40 Fair (a British understatement) < ,20 Poor (read: way too little) (0,356 is too little, patients and their doctors often didn’t agree (red cells)

  11. Validity • 1) Facevalidity: That it is ”clear/obvious” thatthe instrument measureswhat • was to be measured (containstheitemsthatshould be there, and none that • should not, noclearly irrelevant items) • 2) ContentValidity, theslightly more professionalversionofFaceValidity): • That is isclear to professionalsthatthe instrument measureswhat • was to be measured (Polit & Beck ”ContentValidityIndex” (CVI): a) experts • score individuallyeachproposed item on a 1-4 scale, retainitemswhich 80% • oftheexperts or more score ”3” or ”4”, b) calculateaverage score of item • groups (factors): factorswhoseaverage score exceeds .90 have ”Excellent • ContentValidity”) • 3) Criterion-RelatedValidity = Thatthemeasurementresultcorrelateshighly • withsomeexternalcriterion, e.g. wardswhichscoredhigheron • ”Patient Safety Culture” actually has fewer ”AdverseEvents” • Concurrentcriterionvalidity: fewereventsnow, Predictive: fewerevents • nextyear (Stronglymotivatednow to quit smoking = turnednon-smoker • nextyear)

  12. Validity • 4) Constructvalidity(”an evidence-buildingenterprise … lots ofwork” (Polit & • Beck. Nursingresearch, s 462)) • A: Knowngroups-technique: Weconstructed an instrument to measurefearof • givingbirth, and scored pregnant women by it. Ifthosewhohadhadtraumatic • birthexperiencesbeforescoredloweronfearofgivingbirththan mothers • whohadhadno problems during labour or withthechild, our instrument has • ConstructValidity • B: Hypothesizedrelationships-technique: ”Ourinstitutebelieves/hasbuilt a • theorythatfearofgivingbirth is not just anothermanifestationof general • fearfulness, but a particular type offearthatcanappear in persons whoare not • generallyfearful. Ifthe scores provided by ourfear-of-giving-birthscale • correlatedstronglywith scores of an instrument known to provide valid • measurementof general anxiety, wewouldworryabouttheConstructValidity • ofournew instrument

  13. Find factor structure (exploratory factor analysis (EFA – well: an unnecessary step (says I – but many think otherwise))) Is the data set at all ”factorable”? Does the questionnaire reflect factors (underlying dimensions, latent variables)? If not: the questionnaire construction is only about the selection and writing of single questions (but that’s rarely the situation!) 1: Bartlett’s test of sphericity 2: Kaiser-Meyer-Olkin’s measure of sampling adequacy 3: Are the identified factors meaningful? 1: Bartlett Tests whether ”the correlation matrix is an Identity Matrix”. (= Is no variable correlated to any other variable = Are all diagonal values 1, and all off-diagonal values 0?) Criterion: p < ,05 (good news: always is)

  14. 2: Kaiser-Meyer-Olkin (KMO) Are variables linked in larger groups than dyads? (Or – not good for the factor idea – are pairs of correlated variables not linked to the other variables?) Rule of thumb: KMO > ,90 Marvellous ,80-,89 Meritorious ,70-,79 Middling ,60-,69 Mediocre ,50-,69 Miserable < ,50 Unacceptable (read: unfactorable variables) (SPSS manual: Professional Statistics 6.0, s 53)

  15. Are theremeaningfulfactorshere? (= whichcan be named) Whyarefemaledoctors not promoted as often/quickly as male doctors? Howimportant (Unimportant (1) - Veryimportant (4)) aretheseexplanations: 1. Femaleslackthe support oftheir superiors 2. Positionsofleadershipare not easilycombinedwithhousework and familyobligations 3. Femaleslackfemalerolemodels in positionsofleadership 4. Subordinatesdon'twantfemale superiors 5. Femaleslackself-confidence 6. Femalesare "invisible", and therefore not countedamongthose eligible for promotion 7. Femaleslackthenecessarycompetence for positionsofleadership 8. Male employersareunwilling to awardwomenpositionsof leadership 9. Females do not wantleadershipresponsibilities 10. Females have fewer "contacts" withinthe system 11. Men areoftenbettersuitedthanwomen for positionsofleadership 12. Femalesare more interestedthan males in havingdirectcontact withpatients 4: Communalities Er faktorstruktur (eksplorerende (?) faktoranalyse) faktormodellens datatillatelighet (GFI: GoodnessofFitIndices – og dem er det mange av) enkeltfaktorers internkonsistens (α: Cronbachsalpha) item-to-owncorrelation predictivevalidity ”Is all thisreallynecessary?” Ohyes!

  16. F1F2F3F4 No support from superiors .75 .19 -.07 .09 Female superiors unwanted.69 .08 .26 .07 Male employers reluctant.81.01 .00 .08 Women fewer "contacts".64.26 -.00 .12 (alpha = .89) Women no self-confidence .22 .80.04 .02 No female role models .40 .56.01 .05 Women are "invisible" .55 .56-.01 -.03 Women do not want leadership resp -.11 .62.26 .35 (alpha = .90) Women not competent .01 .20 .73.03 Men better suited .07 -.06 .83.04 (alpha = .89) Women prefer housework and family .03 .04 -.05 .82 Women prefer to work with patients .20 .11 .12 .61 (alpha = .71) F1 = ”Kept down”, F2 = ”Don’t dare”, F3 = ”Incompetent”, F4 = ”Don’t want to/want other kind of career”

  17. Should there have been more/fewer factors? Question of cumulative fraction of reflected variance. Rule of thumb: Accept factors until 60 % of the variance in the variables in the data set is ”explained” (Rule of thumb #2: Accept all factors whose ”Eigenvalue” exceeds 1) (#3: choose freely the number of factors to be retained (in SPSS command box)) Always ”as many factors as variables”. But: some links between factors and variables may be so unimportant that one may disregard them. (Disregarding strong links/arrows = disregarding information = reducing the fraction of variance reflected by the (retained) factors) V1 V2 V3 V4 F1 F2

  18. Should variables (items) be deleted? Question of ”communality”: Communality = which fraction of the item’s variance is explained by the factors. (Too little ”communality” = too much ”uniqueness” to ”belong to a factor”) Communality Var1 .695 Var2 .779 Var3 .807 Var4 .271 Variable 4 a candidate for removal from EFA Var5 .828 Var6 .567 ”Too little”-criterion: no criterion whatsoever (not even a rule of thumb) = re-run EFA without the ”low communality”-item, and see what happens (to cumulative % of variance explained, to number of suggested factors, and to item re-grouping)

  19. ”Delete An Interesting Question? For such a technical reason?” Yes, that happens frequently – when a question has too much uniqueness, it may be a good idea to delete it = When some ”important” question is not included, be aware that it may have been there, earlier in the questionnaire’s developmental history (Dinosaurs were there – but at some point in time the Lord recalled them)

  20. ”But deleting this important question threatens the content validity of the questionnaire?” 1) We may well be wrong about the face validity issue (happens frequently) 2) Face validity is at the bottom of the validity criterion hierarchy On the other hand, there are questions which are so interesting in their own right that they should not be removed ”for psychometric reasons” Most questionnaires contain such questions Keep in mind, though, that some of them may be there not for their intrinsic importance, but because they are somebody’s baby

  21. Other technical reasons for deleting good questions include 1) Item did not add to questionnaire’s diagnostic sensitivity/specificity 2) Item did not add to questionnaire’s prognostic power 3) Item correlated perfectly with another item = a) one of the two was redundant, b) collinearity) 4) Item damaged model’s goodness-of-fit (”We know it is you, Fredo”. But couldn’t the system have been saved without taking him out?)

  22. Check item-to-own-factor correlations Criterion: Each item (question) should be more strongly correlated to the factor it belongs to than to any other factor (if not, it probably didn’t really belong to that factor) ”Correlate with factor”? = EFA adds new variables to the file (F1, F2, a.s.o.), on which each patient has a score. Also, each patient has a score on each item – and the patient’s variable score may correlate strongly or weakly with his/her factor score

  23. Check that the factor structure does not vary by the method of factor extraction or type of rotation Factor extraction method: principal components, principal axis factoring, unweighted least squares, GLS, maximum likelihood), alpha method), image factoring, correspondence analysis (Feinschmecker details – never mind for now) Type of rotation: varimax/quartimax/equamax; orthogonal/oblique (Ditto) (feinschmecker details, beyond scope of presentation)

  24. Rotation: Unrotated solution = all four variables high values on both factors (= difficult to name the factors) Factor 1 v2 v1 v4 Factor 2 v3 Rotation = who said the x-axis must run parallell to the floor

  25. Rotated solution Factor 1 Factor 1 v2 v1 v4 Factor 2 v3 Factor 2 Rotation -> F1 = v3 and v4 (v1 & v2 ≈ 0 on F1), F2 = v1 and v1 (v3 & v4 ≈ 0) Type of rotation = who said axis angle must be 90 degrees (oblique rotation OK)

  26. Confirmatory Factor Analysis (CFA) The problem with EFA is EFA always produces factors = machine suggests how many factors ”exist in the data” = no theoretical considerations If EFA produces the expected factors: Fine – but no guarantee: no formal test of the model’s fit to the data (”datatillatelighet”) = still no confirmatory factor analysis CFA = first specify (draw) model, then use CFA to check how well it fits the data Any model implies a set of bivariate correlations (from the arrows in the model follows that some variables correlate highly (those which belong to the same fac- tor), other variables are weakly correlated (those belonging to different factors) = CFA reconstructs correlations from the hypothesized model, and calculates how much they differ from the observed correlations (those which were calculated from the data) Always a discrepancy, no model is perfectly correct. But how big is it? Huge discrepancy: model not acceptable. But what if the discrepancy is small (read: correlations not badly reproduced?)

  27. Smalldiscrepancy = twopossibleinterpretations: 1) Model is wrong 2) Model is correct, what is wrong is the data (”just a sample”: random sample ≠ representative sample) Smalldiscrepancy (betweenthecorrelationsobserved and thoseimplied by the model) = modelmay be correct (true discrepancy is 0) Howlikely is that?CFA-programs (AMOS, LISREL) producethatp-value! Criterion for acceptablefitofmodel to data: p > .05 (”Above .05”? Yes, wewanttheprobabilitythatthemodel is correct to be high, theprobabilitythatthediscrepancybetweentheobserved correlations and thoseimplied by themodel is zero). If so, themodel is permitted by the data SeveralGoodnessofFitIndicesexist – not all producep-values χ2 (p > .05 ifχ2 approximatesmodel’snumberofdegreesoffreedom), Cmin/df (should not muchexceed 2.5), pclose (shouldexceed .05), AGFI (Adjusted GoodnessoffitIndex, shouldexceed .9), RMSEA (RootMeanSquaredError ofApproximation, should not exceed .05)

  28. SAQ factor structure GFIs: χ2 = 1466,9 (df = 568), χ2 /df = 2.583 (p < .001), pclose = .89, AGFI = .871, RMSEA = .048

  29. Alpha Coefficients for factors: Teamwrk Clim .68, Safety Clim.76, Stress Rec .82, Perc Hosp Mgt .82, Perc Unit Mgt .84, Work Conditions .71, Job Satisf .85 Item-to-own-factor r’s: OK (all higher than item-to-other-factor correlations) Test-retest r’s: Teamworkclim 0.72, Safetyclim 0.75, Stressrecognition 0.54, Percephospmgt 0.44, Percepunitmgt 0.71, Workcond 0.75, Jobsatisf 0.71 Conclusion 1: Model fits data, questionnaire has ”Not perfect, but acceptable, psychometric properties” Conclusion 2: ”Construct validity”, concepts and observed variables may be linked as suggested by model (i.e. the model behind the Johns Hopkins-SAQ (which had been CFA-confirmed in the US)) Conclusion 3: Criterion validity; hospital units with higher SAQ scores had fewer adverse events

  30. Rule #1 for how to construct your questionnaire: Avoid/Don’t ”Home mades” = no publish Instead use existing questionnaires: The Binet IQ-test, the MSCEIT (Meyer-Salovey Emotional Intelligence Test), the MMPI (Minnesota Multiphasic Personality Inventory), the HSCL-25 (Hopkins Symptom Check List 25), the SOC-13 (Antonovskys Sense of Coherence 13), the SF-36 (Short Form 36), the SAQ (a.s.o.&o.) New problem: Translate/re-translate Also check psychometric properties of ”validated” questionnaires (in particular: first translation into another language) – reliability and validity is always relative to some set of data, coefficients ”estimated on the basis of this set of empirical data” (”Valid for Californian housewives in 1978” = ?)) The [validity and] reliability of an instrument is a property not of the instrument itself, but of the instrument when administered to a certain sample under certain conditions

  31. Papers given in evidence that the MPCS have been thoroughly validated and have good psychometric properties Exhibit A: Interpersonal Relations Scale – Abbreviated. (No reference given) Exhibit B: Wertheimer A. A philosophical examination of coercion for mental health issues. Behavioral Sciences and the Law 1993; 11: 239-58 Exhibit C: Lidz C, Hoge S, Gardner W, Bennett N, Monohan J, Mulvey E, Roth L. Perceived coercion in mental hospital admission. Pressures and process. Arch Gen Psychiatry 1995; 52 (Dec): 1034-9 Exhibit D: Gardner W, Hoge S, Bennett N, Roth L, Lidz C, Monohan J, Mulvey E. Two scales for measuring patients’ perceptions for coercion during mental hospital admission. Behavioral Sciences and the Law 1993; 11: 307-21

  32. (Instructions to the jury): ”You have seen the evidence, and here’s how you should sum it up …” Exhibit A: Interpersonal Relations Scale – Abbreviated. Disregard. irrelevant to the case: questionnaire only, no psychometrics Exhibit B: Wertheimer A. A philosophical examination of coercion for mental health issues. Disregard. Excellent discussion of the concept ”coercion”, but no reference to questionnaire/instrument

  33. Exhibit C: Lidz C, Hoge S, Gardner W, Bennett N, Monahan J, Mulvey S, Roth L. Perceived coercion in mental hospital admission Refers to a previous article – Monahan J, Hoge S, Lidz C et al. Int J Law Psychiatry 1995; 18: 1-15 – which reported psychometric analyses of four questions … the MacArthur Perceived Coercion Scale” … the Procedural Justice Scale … six questions … a principal components analysis yielded a single factor with an eigenvalue greater than 1, accounting for 64 % of the variance in these questions Procedural Justice was associated with Perceived Coercion

  34. Exhibit D: Gardner W, Hoge S, Bennett N, Roth L, Lidz C, Monohan J, Mulvey E. Two scales for measuring patients’ perceptions for coercion during mental hospital admission [the MacArthur Admission Experience Interview (AEI), the MacArthur Admission Experience Survey (AES, questionnaire)] AEI (structured 30 minute interview): Four perceived coercion questions (= the AEI MacArthur Perceived Coercion Scale, AEI MPCS): Influence: ”What had more influence on you being admitted: what you wanted or what other people wanted?”, Control: ”How much control did you have over whether you were admitted?”, Choice: ”Overall, would you say you chose to be admitted or some-one made you be admitted?”, Freedom: ”Once you were at the emergency room, how free did you feel to do what you wanted about being admitted to the hsptl?” AES (104 questions, later reduced to 41 and then to 15): Five perceived coercion items (= the AES MacArthur Perceived Coercion Scale, AES MPCS) (Influence: ”I had more influence than anyone else on whether I came into hospital”, Control: ”I had a lot of control over whether I went into the hospital”, Choice: ”I chose to come into the hospital”, Freedom: ”I felt free to do what I wanted about coming into the hospital”, and Idea: ”It was my idea to come into the hospital”)

  35. Two samples (N = 161, N = 50) = different settings, both with voluntary as well as coerced admissions (useful for comparisons!) A moderate missing values problem (= acceptable questions (AES OK too?)) Influence Control Choice Freedom Idea AEI 3 % 3 % 0 % 4 % AES 11 % 11 % 11 % 12 % 12 % In sample #1 missing values imputed (by Rubin’s multiple imputation method (Note: always report imputation method!)), five complete data sets analyzed, results were averaged (and compared to results from analysis of data set with listwise deletion: not very different)

  36. Latent variables (factors) identified by correspondence analysis (”a close relative of principal components analysis”): in both samples (only) one underlying dimension (”in both samples eigenvalue of possible second factor not large enough to indicate the existence of a second factor”) Loadings of AES variables on dimension by site and sample provide same results, and does not identify any variable as not belonging to factor Sample 1 Sample 2 VA PA Influence .73 .77 .53 (OK?) Control .83 .74 .80 Choice .82 .80 .90 Freedom .76 .75 .83 Idea .77 .77 .86

  37. ”Internally consistent responses (Cronbach’s alpha?) in both samples and for both formats” (AES, AEI) – and also when survey was administered alone and right after the interview (= patients’ perception of coercion can be measured by the less expensive survey (instead of by the interview)) AES Perceived Coercion Scores correlated strongly with AEI Perceived Coercion Scores (concurrent construct validity (correlations not presented))

  38. Verdict On the basis of the presented evidence: Reliability well documented Validity? To quote authors: ”Internal consistency is not validity … Nor are we claiming that these scales have discriminant validity (to discriminate patients’ perception of coercion from their perception of other noxious qualities of the mental hospital)”

More Related