The Mismeasure of Group Differences in the Law and the Social and Medical Sciences

The Mismeasure of Group Differences in the Law and the Social and Medical Sciences American University Department of Mathematics and Statistics Colloquium Washington, DC, Sept. 25, 2012 James P. Scanlan Attorney at Law Washington, DC jps@jpscanlan.com

Key Points One: Standard measures of differences between outcome rates (proportions) are problematic for appraising the comparative situation of groups reflected by a pair of rates because – for reasons inherent in the underlying risk distributions – each measure tends to be affected by the prevalence of an outcome. Most notably, the rarer an outcome, (a) the greater tends to be the relative difference in experiencing it and (b) the smaller tends to be the relative differences in avoiding it. Absolute differences and odds ratios tend also to be affected by the prevalence of an outcome, though in a more complicated way than the two relative difference. Two: Efforts to appraise differences in the circumstances of two groups reflected by a pair of outcome rates in the law and the social and medical sciences have been universally undermined by failure to recognize the way the chosen measures tend to be affected by the prevalence of an outcome.

Key Points (cont’d) Three: There exists only one answer to the question of whether differences in the circumstances of advantaged and disadvantaged groups reflected by outcome rates have increased or decreased or are larger in one setting than another. Four: That answer can be divined, albeit imperfectly, by deriving from pairs of outcome rates the difference between means of the underlying risk distributions.

Clarifications • Issues raised here affect every effort to appraise the size of the difference between the circumstances of two groups reflected by a pair of outcome rates, including whether such difference should be deemed large or small. • They do not affect conclusions about whether a disparity exists, save when conclusions are influenced by misperceptions about the comparative size of differences between rates. • Of course, the same features of normal distributions underlying the patterns described here are implicated in efforts to adjust for factors that may explain differences. See Underadjustment Issues page. But that issue is not addressed here.

Caveat One • Do not be distracted by fact that one commonly finds departures from the patterns I describe here. Observed patterns are invariably functions of (a) the strength of the forces causing the difference and (b) the prevalence-related /distributionally-driven forces described here. • Society’s interest is in (a). • Only with an understanding of (b) can one discover (a).

Caveat Two • Do not be distracted by the fact that distributions may not be normal. • That may complicate efforts to interpret differences by use of a theoretically sound measure. • But it is no basis for relying on standard measures as if the patterns described here did not exist.

Caveat Three • Do not think that presenting relative and absolute differences (or even both of the two relative differences and absolute differences) by any means addresses the issues raised here. • The fundamental problem is that none of the measures is statistically sound.

Caveat Four • Do not find the points made here “interesting.” • Find them incorrect and disregard them or find them correct and radically alter the way you examine differences in outcome rates.

Two Types of Errors from Failing to Master the Patterns Described Here • Substantive: Misinterpretations of whether differences in circumstances of advantaged and disadvantaged groups have changed over time or otherwise are larger in one setting than another. • Inferential: Drawing erroneous conclusions about other things based on misperceptions about the comparative size of differences

Table 1: Explanation of Measures By References to Test Passage/Failure For some nuances see Semantic Issues, Times Higher, and Percentage Points pages.

Interpretive Rule 1 (IR1): The Two Relative Differences(Heuristic Rule X (HRX), Scanlan’s Rule) The rarer an outcome the greater tends to be the relative difference in experiencing it and the smaller tends to be the relative difference in avoiding it.

Table 2: Simplified Illustration of Effects of Lowering Test Cutoff (National Law Journal 2012, Recorder 2012)

Figure 1: Figurative Simplified Illustration of Lowering Test Cutoff

Fig. 2. Ratios of (1) DG Fail Rate to AG Fail Rate and (2) AG Pass Rate to DG Pass Rate at Various Cutoff Points Defined by AG Fail Rate

Other Illustrative Data • Income Illustrations • NHANES Illustrations • Framingham Illustrations • Life Table Illustration • Credit Score Illustrations

IR1 Implications – General (1) • Test pass/test fail (proficiency/non-proficiency) • Poverty/non-poverty (Feminization of Poverty) • Mortality/survival (Mortality and Survival) • Immunization/no immunization (Immunization Disparities) • Hypertensive/normal (NHANES Illustrations, ICHPS 2008) • Low folate/adequate folate (NHANES Illustrations, Comment on Dowd IJE 2008) • Loan rejection/loan approval (Lending Disparities) • Expulsion/retention (Discipline Disparities, Los Angeles SWPBS)

IR1 Implications – General (2) • Less discriminatory alternatives (Discipline Disparities (B-D), Disparate Impact, Less Discriminatory Alternative – Substantive)) • Lending Issues • Performance/retention standards • Disqualifying criteria (arrest/convictions/bad credit) • Mandatory sentencing (three-strikes etc.) • Discretion/review • Four-Fifths Rule

IR1 Implications – Subpopulations • Racial differences in infant mortality among highly-educated (“Race and Mortality”) • SES (racial) differences among different racial (SES) groups (“Perils of Provocative Statistics”) • Occupational differences in mortality among British Civil Servants (Whitehall Studies) • Racial and socioeconomic differences in mortality among younger age groups (Life Tables Illustrations) • Racial differences in mortgage rejection rates among high income applicants (Disparities – High Income) • Racial differences in completion/non-completion rates at elite universities • Suburban discipline disparities (Suburban Disparities) • Nordic health disparities

Corollary 1 to IR1 • As an outcome changes in overall prevalence, the group with a lower baseline rate for experiencing the outcome will tend to undergo a larger proportionate change in the rate of experiencing the outcome while the other group will tend to experience a larger proportion change in the opposite outcome. • E.g., in simplified illustration of lowering test cutoff (Table 2), fail rate decreased 75% for AG and 64.9% for DG; pass rate increased 18.8% for AG and 38.1% for DG.

Implications of Corollary 1 to IR 1 • Effects of reductions/increases in poverty • Effects of lowering/raising cutoffs (improving performance) • Effects of improving health outcomes • Explanatory theories: “diffusion of innovation,” “inverse equity hypothesis” (Explanatory Theories) • Effects of chronic conditions on self-rated health (Reporting Heterogeneity, Comment on Delpierre BMC Pub Hlth 2012) • Subgroup effects (Subgroup Effects, Illogical Premises)

Corollary 2 to IR1 • When the prevalence of an outcome declines, the group most susceptible to the outcome will tend to comprise both (a) a larger proportion of those continuing to experience the outcome and (b) a larger (sic) proportion of those no longer experiencing the outcome. (Feminization of Poverty, Table 1 of Chance 2006)

Fig. 3. Proportion DG Comprises of (1) Persons Who Fail and (2) Persons Who Pass at Various Cutoff Points Defined by AG Fail Rate

Implications of Corollary 2 to IR1 • Feminization of Poverty • Racial impact of Proposition 48 • Any discussion of the proportion a group comprise of persons experiencing some adverse outcome (addressed infra)

Absolute Differences/Odds Ratios • Absolute differences and differences measured by odds ratios are unaffected by whether one examines the favorable or the adverse outcome. • But an effective indicator must remain constant when there occurs a change in overall prevalence akin to that effected by lowering a test cutoff. • Absolute differences and odds ratios tend also to be affected by the prevalence of an outcome but in a more complicate way than the two relative differences.

Interpretive Rule 2(IR 2): Absolute Differences/Odds Ratios • AD Formulation A: As the prevalence of an outcome changes, absolute differences between rates tend to change in the same direction as the smaller of the two relative differences (save when one group’s rate of experiencing either outcome is above 50% and the other group’s rate is below 50%). • AD Formulation B: As an outcome goes from being rare to being universal, absolute differences tend to increase to the point where the first group’s rate reaches 50%; behave inconsistently until the second group’s rate reaches 50%; then decline. • As the prevalence of an outcome changes, differences measured by odds ratios tend to change in the opposite direction of absolute differences.

Figure 4: Two Normal Distributions

Fig. 5: Ratios of (1) DG Fail Rate to AG Fail Rate, (2) AG Pass Rate to DG Pass Rate, (3) DG Failure Odds to AG Failure Odds; and (4) Absolute Difference Between Rates Zone A ●

Fig. 6. Ratios of (1) Black to White Rates of Falling Below Percentages of Poverty Line, (2) White to Black Rates of Falling Above the Percentage, (3) Black to White Odds of Falling Below the Percentage: and (4)Absolute Differences Between Rates ●

Fig. 7. Ratios of (1) Black to White Rates of Falling Above Various Systolic Blood Pressure Levels, (2) White to Black Rates of Falling below the Level, (3) Black to White Odds of Falling Above the Level; and (4) Absolute Difference Between Rates (NHANES 1999-2000, 2001-2002, Men 45-64) ●

Implications of IR2 (1) • As uncommon procedures (e.g., coronary artery bypass grafting, knee replacement) increase, absolute differences tend to increase; as common procedures (e.g., mammography) increase, absolute differences tend to decrease. (APHA 2007, Comments on Vaccarino etc. NJEM 2005, Schneider JAMA 2001, Trivedi JAMA 2006 (2007), Sequist Arch Int Med 2006, McWilliams Ann Int Med 2009) • As procedures go from being uncommon to being very common absolute differences tend to increase then decrease. • Increased proficiency in more difficult subjects will tend to increase absolute differences, while increased proficiency in easier subjects will tend to reduce absolute differences. (Educational Disparities)

Implications of IR2 (2) • For outcomes or settings with generally low rates, higher rates tend to be associated with larger absolute differences; for outcomes or settings with generally high rates, higher rates tend to be associated with lower absolute difference. (Between Group Variance, Comment on Baicker Hlth Aff 2004) • Pay for Performance Issues (addressed infra).

References - Articles • “Can We Actually Measure Health Disparities?,” Chance 2006 • “Race and Mortality,” Society 2000 • “Divining Difference,” Chance 1994 • “The Perils of Provocative Statistics,” Public Interest 1991 • “’Feminization of Poverty’ is Misunderstood,” Plain Dealer 1987 • [“Illusions of Job Segregation,” Public Interest 1988]

References – Conference Presentation • “The Misinterpretation of Health Inequalities in the United Kingdom,” British Society for Population Studies (2006) • “Measurement Problems in the National Healthcare Disparities Report,” American Public Health Association (2007) • “ Can We Actually Measure Health Disparities,” 7th International Conference for Health Policy Statistics (2008) • “Measuring Healthcare Disparities,” 3rd North American Congress of Epidemiology (2011) • “Perverse Perceptions of the Impact of Pay for Performance on Healthcare Disparities,” 9th International Conferences for Health Policy Statistics 2011

References – Web Pages • Measuring Health Disparities Page (MHD), especially Section E.7 (Consensus/Nonconcensus) • Scanlan’s Rule Page (SR) • Mortality and Survival • Lending Disparities, Discipline Disparities, Educational Disparities, Immunization Disparities , Disparate Impact, Feminization of Poverty • Subgroup Effects Subpage (SR) • Institutional Correspondence Subpage of (MHD) , especially DOJ Letter and Harvard Letter

Things to Follow (1) • Show that value judgments are not involved in a choice of measure of health or other disparities and that there can be only valid interpretation of whether a disparity is larger in one setting than another. • Show a theoretically sound measure of the difference reflected by a pair of rates. • Show the implications of the NCHS decision to measure disparities in terms of relative differences in adverse outcomes and the general disarray in health disparities research.

Things to Follow (2) • Show the illogic of the rate ratio as a measure of association and as a benchmark in subgroup analysis and show a sound method for calculating the number needed to treat. • Explain the mistaken perceptions of the impact of pay for performance on healthcare disparities and Massachusetts’ misguided actions based on one of those perceptions. • Address the inverse relationship between (a) proportion DG comprises of those experiencing an adverse outcome and (b) proportion of DG experiencing the adverse outcome.

Table 3: Illustration of Appraisals of the Comparative Degree of Employer Bias Using Different Measures of Disparities in Selection/Rejection (as an illustration that choice of measure does not involve a value judgment and that all standard measures are unsound) • Parenthetical numbers reflect rankings from most to least discriminatory employer according the particular measure

Table 3a: Illustration of Appraisals of the Comparative Degree of Employer Bias Using Different Measures of Disparities in Selection/Rejection: Answer to which is most biased. • Which employer is in fact most biased? They are all the same. Each row reflects the half standard deviation between means underlying Tables 1 and 2 and Figures 1 through 5. • Moreover, there is no rational argument that they are different.

Additional Factors Supporting Point of Table 3 • Exploring reason for changes in disparities or for why one disparities is larger than another. • Drawing inferences about other things on the basis of appraisals of the comparative size of disparities or effects.

Table 4. Summary of Directions of Changes in MeasuresComment on Escarce and McGuire APHA 2004 Authors had three plausible (and possibly correct) explanations for generally declining relative differences in receipt of uncommon cardio procedures during time of general increases. Researchers relying on relative differences in adverse outcome or absolute differences presumably would seek different reasons for the perceived increases in disparities.

A Sound Measure of Disparity • Implied in Table 3 • Derive from a pair of rates the difference between the means of the underlying, hypothesized normal distributions (in terms of percentage of a standard deviation). • EES for estimated effect size • Solutions sub-page of MHD • Probit (Chester Ittner Bliss 1934)

Table 5. Illustration of Meaning of Various Ratios at Different Prevalence Levels

Problems with the Solution • Always practical issues (we do not really know the shape of the underlying distributions) • Sometimes fundamental issues (e.g., where we know distributions are not normal because they are truncated portions of larger distributions). (Cohort Considerations,Life Tables Illustrations, Credit Score Illustrations, Comment on Boström and Rosen Scan J Pub Health 2003) • Irreducible minimum issues (Irreducible Minimums)

Notwithstanding, the problems the approach remains vastly superior to reliance on any of the standard measures. • And how else, for example, would we be able to divine that the degrees of bias reflected by the actions of the employers in Table 3 are basically the same?

Table 6. Illustration of Problematic Nature of Representational Comparisons

Explanation of Table 6 • Employment discrimination cases and various other matters (e.g., racial profiling analyses) are commonly based on comparisons of the proportion a group comprises of a pool and the proportion it comprises of persons experiencing an outcome. • We can derive the rate ratios from the two proportions, as reflected in the final column. • But we need the actual rates in order to derive the EES and determine which setting reflects the greater difference in the forces underlying the observed patterns.

Immunization Disparities: Table 7: Changes in Total and Black Rates of Pneumococcal and Influenza Vaccination Rates, 1989-1995 (HHS Progress Review: Black Americans, Oct. 26, 1998)

Key Government Approaches to Disparities Measurement • National Center for Health Statistics (NCHS) (Health People 2010, 2020 etc.) • relative differences in adverse outcomes • Agency for Healthcare Research and Quality (AHRQ)(National Healthcare Disparities Report) • whichever relative difference (favorable or adverse) is larger • Centers for Disease Control and Prevention (CDC) (Jan. 2011 Health Disparities and Inequalities Report) • absolute differences between rates (mainly)

Immunization Disparities: Table 7: Changes in Total and Black Rates of Pneumococcal and Influenza Vaccination Rates, 1989-1995 (HHS Progress Review: Black Americans, Oct. 26, 1998)

Table 8: Illustration Based on Morita et. al. (Pediatrics 2008) Data on Black and White Hepatitis Vaccination Rates Pre and Post School-Entry Vaccination Requirement (see Comment on Morita)

The Mismeasure of Group Differences in the Law and the Social and Medical Sciences