1 / 33

Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D.

Competent Use Of Actuarials Requires Understanding Sample-Wise Variations In Both Recidivism And Test Accuracy. Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D. Available at http://richardwollert.com. In A 2004 Article In Sexual Abuse ,

Download Presentation

Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Competent Use Of Actuarials Requires Understanding Sample-Wise Variations In Both Recidivism And Test Accuracy Richard Wollert, Ph.D. Diane Lytton, Ph.D. Jacqueline Waggoner, Ed.D. Marc Goulet, Ph.D. Available at http://richardwollert.com 2005 ATSA Convention, Nov 16-19, Salt Lake City

  2. In A 2004 Article In Sexual Abuse, Doren Compared The 5-Year Score-Wise Recidivism Rates For The Construction Samples Of The RRASOR And Static-99 With A Number Of Generalization Samples. Notes on Abbreviations Score-wise Recidivism = SWR = Rate for a given point total Construction Sample = CS = Developmental Sample Generalization Sample = GS = A Comparison Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

  3. The Purpose Of These Comparisons • “To discover the degree to which the risk percentages for each instrument score replicate across different samples and different underlying base rates” (p. 27). 2005 ATSA Convention, Nov 16-19, Salt Lake City

  4. As A First Step In This Study, Many Data Sets Were Obtained From Different Sources • The data sets, or generalization samples, reported the number of recidivists and non-recidivists at each test score. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  5. Two Procedures Were Used To Combine The Data From the GSs • 1. Recidivism data for all GSs were pooled into a single “mega-sample” that was stratified by test scores. • Samples with BRs below that of the CS were not differentiated from samples with higher BRs. • 2. The data from the GSs were combined to form 8 “semi-overlapping groups” that varied in their overall recidivism rates (from about 6% to 29%). • These were also stratified by test scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  6. The Table Below Shows How GSs Were Combined To Form 8 “Semi-Overlapping” Groups 2005 ATSA Convention, Nov 16-19, Salt Lake City

  7. The Data Were Analyzed Using Two Chi-Square Designs • 1. The recidivism rate for each test score in the CS was compared with the rate for the corresponding score that was derived from the mega-sample. • 2. The recidivism rate for each test score in the CS was compared with the rate for the corresponding score from each of the overlapping samples. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  8. Here Is A Format For Organizing RRASOR Data In The First Analysis. Two “Summary” Experience Tables Are Contrasted At Each Of Six Levels. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  9. Here Is A Format For Organizing RRASOR Data For The Second Analysis. Two Summary Experience Tables Were Again Contrasted. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  10. Seven Other Sets Of Tables Like The One Above Were Also Part Of The Second Analysis Because Data Were Combined To Make 8 Groups. Here Is The Last One. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  11. Findings • One significant difference in 13 tests was found when the SWR rates from the mega-sample were compared against those from the CS. • Relatively few significant differences were observed when the recidivism rates for the overlapping groups with overall base rates ranging from 9% to 21% were compared against those from the CS. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  12. A Number of Claims Were Based On These Patterns Of Non-significant Findings • Every 5-year SWR rate from each of the CSs was “replicated” (p. 33) in the GSs. • The SWR rates “remained essentially unchanged … through a range of plus or minus 6% around a center point” (p, 33). • For the RRASOR the center point was 13%. • For Static-99 it was 15%. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  13. Some Guidelines For Evaluators Who Administer The RRASOR and Static-99 Were Also Formulated On The Basis Of These Findings • When using the RRASOR, they can always assign the SWR rates for the CS because no meaningful differences in SWR rates were found between the CS and groups with differing BRs. • With Static-99 they should determine if an offender is from a parent population with a very high or low BR (because some differences were found in these regions). • It was recommended that the rate for the CS be assigned where the BR for the parent population ranges from 9-21%. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  14. The Author Also Claimed His Results Provided Empirical Evidence That SWR Rates Don’t Always Fluctuate When The BR For One Sample Differs From Another • In particular, he stated that “although it may have been believed that a sample’s underlying base rate could effect (sic) the interpretation of the actuarial instruments’ scores, that belief was found largely not supported (in my analysis) … the argument has become significantly weaker that an unknown sample recidivism base rate affects the interpretability of actuarial scores” (p. 34, Stability of the Interpretive Risk Percentages for the RRASOR and Static-99, 2004, Sexual Abuse,16, 25-36). 2005 ATSA Convention, Nov 16-19, Salt Lake City

  15. Some Evaluators May Be Tempted To Justify SVP Civil Commitment Recommendations On The Basis Of This Article. Here Is One Possible Train Of Logic. • Defendant Jones has a high RRASOR score. • The BR for the parent population from which defendant Jones was drawn may be lower than that for the RRASOR CS. • Doren has shown that SWR rates for high RRASOR scores are the same even when the BR for one sample is lower than another. • The recidivism rate for high scorers in the CS sample is therefore applicable to Jones. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  16. Experts Who Rely On This Argument Run The Risk Of Providing Information That Is Misleading • Why? Because the research in question contains many methodological and conceptual flaws. • We will discuss only one of these flaws today, but we believe it is so fundamental and devastating that it invalidates the findings, conclusions, and interpretations reported in the article of concern. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  17. The Flaw Is This: The Original Research Question Was Posed Too Narrowly To Fully Address The Issue Of Replication • From the stated purpose and the article’s context, it is apparent that replication was conceived of as simply the stability of recidivism rates for each score over different summary experience tables. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  18. Score-Wise Recidivism Is Defined By A Math Formula, However. A Variation On Bayes’s Theorem, The Formula is E=PT/(PT+QF) • P = The base rate for those with test scores that fall within a specified range of scores. • The range could include all scores (Case “A”, scores 0-6+ on Static-99) or a subset of scores (Case “B”: scores of only 4-6+). • Q = The non-recidivism rate, which is always 1-P. • T = The true positive fraction: The % of recidivists with high scores in a specified range of scores. • Case A:# recidivists with 6+ scores/# recidivists with 0-6+ scores • Case B:# recidivists with 6+ scores/# recidivists with 4-6+ scores • F = The false positive fraction: The % of non- recidivists with high scores in a range of scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  19. Using Bayes’s Theorem To Calculate E For Case A (0-6+):E = (.180 x .256) / ((.180 x .256) + (.82 x .089)) = .39 2005 ATSA Convention, Nov 16-19, Salt Lake City

  20. Using Bayes’s Theorem To Calculate E For Case B (4-6+): E = (.315 x .371) / ((.315 x .371) + (.685 x .286)) = .374 2005 ATSA Convention, Nov 16-19, Salt Lake City

  21. Several Principles May Be Deduced From E = PT / (PT + QF) • 1. Each score-wise rate reported in a summary experience table is the product of several variables (P, T, and F) that constitute an underlying (and rarely disseminated) “component” experience table. • 2. Samples may have similar score-wise recidivism rates, but differ with respect to P, T, or F (see slide 22). • 3. A score-wise recidivism rate is truly replicated only when the associated values of P, T, and F from different experience tables are replicated (also see slide 22). 2005 ATSA Convention, Nov 16-19, Salt Lake City

  22. Variations In P, T, and F May Be Found For Samples With Similar SWR Rates: An Example Using Static-99 Data (Note: E Is Obtained By Applying Bayes’s Theorem) 2005 ATSA Convention, Nov 16-19, Salt Lake City

  23. Other Principles • 4. If T and F are stable, the recidivism rate will change only if P changes. • 5. If P and Q are stable, the score-wise recidivism rate will change as a function of changes in the “likelihood” ratio of T/F. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  24. These Mathematical Facts Hold Important Implications For The Research Being Analyzed • Recall that the author concluded that the score-wise recidivism rates for samples with different overall recidivism rates did not differ from one another. • Assuming that acceptance of the null hypothesis is justified, this can mean only one thing. • The likelihood (T/F) ratio changed from one sample to another. • Mossman, who has published many articles on ROC analysis and Bayes’s theorem, made the same point about Doren’s research in an article that has been accepted for publication in Sexual Abuse. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  25. We Tested This Hypothesis After Obtaining The Frequency Data Analyzed In The Original Study • Adopting 5 as a high score on the RRASOR, likelihood ratios were calculated for the construction sample and for all generalization samples where this was possible. • It was impossible to define LRs for 3 of 10 samples. • Adopting 6 as a high score, equivalent calculations were undertaken for the Static-99 construction sample and for all generalization samples. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  26. Other Steps Of The Re-analysis • Upper and lower confidence intervals (p=.05) were established for the LRs from the RRASOR and Static-99. • The LRs for the generalization samples were plotted against the confidence intervals for the construction sample. • Data for other scores were not analyzed because recidivism rates for lower scores were correlated with recidivism rates for maximum scores. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  27. All Likelihood Ratios From The RRASOR Generalization Samples Were Significantly Different From The Likelihood Ratio For The RRASOR Construction Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

  28. The Likelihood Ratios In 6 Of 7 Generalization Samples Were Significantly Different From The Likelihood Ratio For The Static-99 Construction Sample 2005 ATSA Convention, Nov 16-19, Salt Lake City

  29. Correlational Analyses Indicated That Test Accuracy Decreased As Base Rates Increased RRASOR LRs with sample-wise base rates: -.52 (n = 8; p = .17). Static-99 LRs with sample-wise base rates: -.86 (n = 8; p < .01) 2005 ATSA Convention, Nov 16-19, Salt Lake City

  30. Implications Of This Re-analysis For The Research Under Consideration • Score-wise recidivism rates were not replicated in the criticized research because similarities in rates were an artifact of fluctuations in likelihood ratios. • Characterizing the principle that score-wise recidivism rates vary with base rate differences as a “belief” is misleading. As long as F and T are stable, it is a mathematical fact. • Proposing guidelines for evaluators to follow that conflict with Bayes’s theorem is potentially harmful because of the increase in prediction errors that this may occasion. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  31. Practice Implications • Variations in detection indicia and base rates raise doubts about the applicability of published SWR rates for RRASOR and Static-99 to local populations. • Agencies that use these tests should consider re-norming them on local populations. • The correlational analyses suggest that these tests are most inaccurate for populations that are of greatest concern because of their high recidivism rates. • When using these tests, examiners should disclose their assumptions about P, T, and F, and present data that support their assumptions. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  32. Research Implications • Current data on representative and large samples would facilitate meaningful replication research. • Test developers might improve accuracy by investigating factors that produce fluctuations in likelihood ratios. • Why is accuracy so diminished in groups with high base rates? • Component experience tables should be compiled to accompany summary tables. These tables should include frequency data for true positives, true negatives, false positives, and false negatives. Associated Bayesian values should also be included. Each table should describe subjects and sampling methods. 2005 ATSA Convention, Nov 16-19, Salt Lake City

  33. Component Experience Tables Should Be Compiled To Accompany Summary Experience Tables 2005 ATSA Convention, Nov 16-19, Salt Lake City

More Related