440 likes | 721 Views
The most dangerous equation. Howard Wainer National Board of Medical Examiners. The danger equation. P(Y) = P(Y|x=1) P(x=1) + P(Y|x=0) P(x=0) Where Y = The event of looking like an idiot, x = 1 The event of knowing the equation in question x = 0 The event of not knowing the equation .
E N D
The most dangerous equation Howard Wainer National Board of Medical Examiners
The danger equation P(Y) = P(Y|x=1) P(x=1) + P(Y|x=0) P(x=0) Where Y = The event of looking like an idiot, x = 1 The event of knowing the equation in question x = 0 The event of not knowing the equation. This equation makes explicit the two aspects of what constitutes a dangerous equation.
There are two obvious pieces: • Some equations are most dangerous if you know them (the first term on the right side), • Others are dangerous if you do not (the second term).
I will not explore the dangers an equation might hold because the secrets within its bounds open doors behind which lie terrible peril. Few would disagree that the obvious winner in this contest would be Einstein’s iconic equation E = MC2 (1) for it provides a measure of the enormous energy hidden within ordinary matter.
This is not the direction I wish to pursue. Instead I am interested in equations that unleash their danger, not when we know about them, but rather when we do not; Equations that allow us to understand things clearly, but their absence leaves us dangerously ignorant.
Graphically, we can place all equations on a scale like that depicted below:
There are many plausible nominees for the title of “Most Dangerous” For example, economists might, with ample evidence, pick Kelley’s equation
See for example, Baumol et al (1989), Carnevale (1999), King (1934), Mills (1924), Secrist (1933), Sharpe (1985), and Williamson (1991) who interpreted regression toward the mean as having economic causes rather than merely reflecting the uncertainty of prediction. For more details of its misuse see Friedman (1992), Hotelling (1933), Stigler (1997), and Wainer & Brown (2007).
Others might suggest that even more have been led astray through their ignorance of Bayes’ Theorem P(X|Y) = P(Y|X) P(X)/P(Y) (3) See the Nobel Prize winning work of Kahnemann & Tversky (e.g. Kahneman, 2002) who provide empirical evidence of how human decision-making can go wildly wrong because this equation is not built into human intuition.
Others (Mosteller & Tukey, 1977) have suggested that the linear model Y = b1x1+ b2x2 + …+ bnxn (4) might be considered a prime candidate because: 1) It suggests that b1 tells about the relationship between y and x1, which is not the case, yet many interpret it in that way. 2) It encourages fallacious causal interpretation (if I change x1 by 1 unit then y will change by b1 units). 3) It encourages fallacious interpretation even by those who think they are being careful. ("I can't know the value of the coefficient, but surely its sign tells whether increasing x1 will increase or decrease y.") 4) It is badly non-robust, but rarely diagnosed appropriately, so many models are misleading.
And5) It assumes linearity at a high-dimensional level, which is hard to check. But even at the bivariate and trivariate levels, often the data are not linear, but not checked.6) When applied to observational data (as it almost always is), it is difficult to know whether an appropriate set of predictors has been selected -- and if we have an inappropriate set, our interpretations are questionable. 7) It collapses with little warning when computations are degenerate.It is dangerous, ironically, because it can be the most useful model for the widest variety of data when wielded with caution, wisdom, and much interaction between the analyst and the computer program.
Although these are all worthy competitors I believe that the championship title must go to De Moivre’s (1730) equation: (5)
I arrived at this conclusion for three reasons related to: (i) The extreme length of time that ignorance of it has caused confusion, (ii) The wide breadth of areas of application that have been misled, and (iii) The seriousness of the consequences that such ignorance has caused.
In the balance of this talk I will describe five very different situations in which ignorance of De Moivre’s equation has led to billions of dollars of loss over centuries yielding untold hardship; these are but a small sampling, there are many more.
Example 1. The Trial of the Pyx: six centuries of misunderstanding
In 1150, a century after the Battle of Hastings, it was recognized that the king could not just print money and assign it to have any value he chose. Instead the coinage’s value must be intrinsic, based on the amount of precious materials in its make-up. And so standards were set for the weight of gold in coins – a guinea should weigh 128 grains (there are 360 grains in an ounce). It was recognized, even then, that coinage methods were too imprecise to insist that all coins be exactly equal in weight, so instead the king and the barons, who supplied the London Mint (an independent organization) with gold, insisted that coins when tested in the aggregate (say 100 at a time) conform to the regulated size plus or minus some allowance for variability (1/400th of the weight) which for one guinea would be 0.32 grains and so, for the aggregate, 32 grains). Obviously, they assumed that variability increased proportionally to the number of coins and not to its square root. This deeper understanding lay almost 600 years in the future with De Moivre’s (1730) exploration of the binomial distribution (Stigler, 1999, 367-368).
The costs of making errors are of two types. If the average of all the coins was too light the barons were being cheated, for there would be extra gold left over after minting the agreed number of coins. This kind of error is easily detected and, if found, the director of the Mint would suffer grievous punishment. But if the variability were too great it would mean that there would be an unacceptably large number of too heavy coins produced that could be melted down and recast with the extra gold going into the pockets of the minter. By erroneously allowing too much variability it meant that the Mint could stay within the bounds specified and still make extra money by collecting the heavier-than-average coins and reprocessing them. The fact that this error was able to continue for almost 600 years provides strong support for De Moivre’s equation to be considered a strong candidate for the title of most dangerous equation.
Figure 3. Age adjusted kidney cancer rates for all US counties in 1980-1989 shown as a function of the log of the county population.
Example 3.The small schools movement: Billions for increasing variance In the late 1990s The Bill and Melinda Gates Foundation began supporting small schools on a broad-ranging, intensive, national basis. By 2001, the Foundation had given grants to education projects totaling approximately $1.7 billion. They have since been joined in support for smaller schools by the Annenberg Foundation, the Carnegie Corporation, the Center for Collaborative Education, the Center for School Change, Harvard’s Change Leadership Group, Open Society Institute, Pew Charitable Trusts, and the U.S. Department of Education’s Smaller Learning Communities Program. The availability of such large amounts of money to implement a smaller schools policy yielded a concomitant increase in the pressure to do so, with programs to splinter large schools into smaller ones being proposed and implemented broadly (e.g. New York City, Los Angeles, Chicago and Seattle).
Obviously the hope was that smaller schools will yield higher achievement, or E(achievement|small) > E(achievement|big) But what was the supporting evidence? It was found that when one looks at high performing schools one is apt to see an unrepresentatively large proportion of smaller schools. Or, stated mathematically, that P(small|high achievement) is greater than anticipated by the number of small schools. Unfortunately, the latter does not imply the former.
5th grade math scores by school size in Pennsylvania in 2005.Small schools are overabundant at both extremes.
In these data from Pennsylvania we see that 12% of the highest performing schools are from the 3% smallest (a 400% over-presentation). But alas there is a similar over-representation among the worst performing schools. The regression line is flat – in 5th grade at least, school size has no effect.
In high school the story is different There is still an over-representation of small schools at both extremes, but now the regression line has a strong positive slope. Small schools perform worse!
On October 26, 2005 , after expenditures of over $1.7 billion, Lynn Thompson, in an article in The Seattle Times reported that: “The Gates Foundation announced last week it is moving away from its emphasis on converting large high schools into smaller ones and instead giving grants to specially selected school districts with a track record of academic improvement and effective leadership.”
This point of view was amplified in a study by Schneider, Wysse & Keesler, that was reported in an article by Debra Viadero in Education Week (June 7, 2006) in which after a careful analysis of matched students in schools of varying sizes Ms. Schneider concluded, “I’m afraid we have done a terrible disservice to kids.”
Example 4. The Safest Cities to Drive in In the June 18, 2006 issue of the New York Times (News of the Week in Review, page 2) there was a short article that listed the ten safest US cities and the ten most unsafe based on an automobile insurance company statistic “average number of years between accidents”. The cities were drawn from the 200 largest cities in the US. It should come as no surprise that a list of the ten safest cities or the ten most dangerous cities have no overlap with the ten largest cities.
Example 5. Sex differences “It does appear that on many, many, different human attributes – height, weight, propensity for criminality, overall IQ, mathematical ability, scientific ability – there is relatively clear evidence that whatever the difference in means – which can be debated – there is a difference in standard deviation/variability of a male and female population. And it is true with respect to attributes that are and are not plausibly, culturally determined.” Lawrence Summers (2005)
Note that the ratio of standard deviations was 1.10 in Project Talent(a 1960 study of 73,000 15 year olds). The reduction in the difference in variability over the intervening 40 years may be progress or may just reflect a difference in the character of the tests.
In discussing Lawrence Summers’ remarks Christiane Nüsslein-Volhard, the 1995 Nobel Laureate in Physiology/Medicine, said, “He missed the point. In mathematics and science, there is no difference in the intelligence of men and women. The difference in genes between men and women is simply the Y chromosome, which has nothing to do with intelligence.” (Dreifus, July 4, 2006)
Is Professor Nüsslein-Volhard’s syllogism: 1. There are no differences in intelligence between men and women. 2. Men have a Y and women don't. 3. Therefore nothing about intelligence can be carried on Y. Or is it: 1. There is nothing about intelligence carried on Y (what evidence is there on this?) . 2. The only genetic difference between the sexes is men have a Y and women do not, 3. Therefore there are no differences in intelligence between men and women.
The first argument seems circular, so let us assume that she meant the latter and I am just ignorant of the evidence she refers implicitly to in statement (1). If so perhaps it is Professor Nüsslein-Volhard who missed the point here. The Y chromosome is not the only difference between the sexes!
Summers’ point was that when we look at either extreme of an ability distribution we will see more of the group that has greater variation. Any mental trait that is conveyed on the x-chromosome will have larger variability among males than females, for females have two x-chromosomes to only one for males. Thus, from De Moivre’s equation, we would expect, ceteris paribus, about 40% more variability among males than females. The fact that we see less than ten percent greater variation in NAEP demands the existence other modes of transmission.
Obviously there must be major causes of high-level performance that are not carried on the x-chromosome, and indeed some causes are not genetic. But it suggests that for some skills between 10% and 25% of the increased variability is likely to have had its genesis on the x-chromosome.
This view gained further support in studies by Arthur Arnold and Eric Vilain of UCLA that were reported by Nicholas Wade of the New York Times on April 10, 2007. He wrote, “It so happens that an unusually large number of brain-related genes are situated on the X chromosome. The sudden emergence of the X and Y chromosomes in brain function has caught the attention of evolutionary biologists. Since men have only one X chromosome, natural selection can speedily promote any advantageous mutation that arises in one of the X’s genes. So if those picky women should be looking for smartness in prospective male partners, that might explain why so many brain-related genes ended up on the X.”
He goes on to conclude, “Greater male variance means that although average IQ is identical in men and women, there are fewer average men and more at both extremes. Women’s care in selecting mates, combined with the fast selection made possible by men’s lack of backup copies of X-related genes, may have driven the divergence between male and female brains.”
Conclusions Humans don’t fully comprehend the effect that variation, and especially differential variation, has on what we observe. Daniel Kahneman’s 2002 Nobel Prize was for his studies on intuitive judgment; he showed that humans don’t intuitively “know’ that smaller hospitals would have greater variability in the proportion of male to female births.
But such inability is not limited to humans making judgments in psychology experiments. • Routinely small hospitals are singled out for special accolades because of their exemplary performance only to slip toward average in subsequent years. Explanations typically abound that discuss how their notoriety has overloaded their capacity. • Similarly small mutual funds are recommended, post hoc, by Wall Street analysts only to have their subsequent performance disappoint investors.
The list goes on and on, adding evidence and support to my nomination of De Moivre’s equation as the most dangerous of them all.