1 / 26

THE EARLY HiSTORY OF BAYESIAN STATISTICs Tom Leonard

REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from 1972) on my website www.thomashoskynsleonard.co.uk Refers to technical material in my book

helene
Download Presentation

THE EARLY HiSTORY OF BAYESIAN STATISTICs Tom Leonard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from 1972) on my website www.thomashoskynsleonard.co.uk Refers to technical material in my book Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers (1999, with John S.J. Hsu) Cambridge University Press See also my academic life story The Life of a Bayesian Boy. Self-published on my website Slides prepared by Thomas Tallis THE EARLY HiSTORYOF BAYESIAN STATISTICsTom Leonard

  2. Among competing (plausible) hypotheses, the hypothesis with the fewest assumptions should be selected. (WILLIAM OF OCKHAM) In other words: Keep things simple, and cut out extraneous information FOR EXAMPLE:: Use parameter parsimonious sampling models which depend upon on low numbers of unknown parameters (e.g. which minimise AIC or DIC) Contrasts with: ‘A model should be as big as an elephant’ (Leonard ‘Jimmie’ Savage, 1954, Lindley, 1983) Agrees with: ‘The greater the amount of information the less you actually know’ (Toby Mitchell, c 1980) Related to: E.T. Jaynes’ extremely valuable idea (1957 and 1968) of choosing the ‘maximum entropy’ prior distribution when only p summaries of the prior information are specified. OCCAM’S RAZOR (William of Ockham, c1287-1347)

  3. Pascal Fermat Blaise Pascal (1623-1662) formulated ‘Pascal’s Wager’ by reference to the notion of subjective probability. Pascal corresponded with Pierre de Fermat about the potential development of probability theory. In 1654, Pascal and De Fermat (1601 or 1607 -1665 ) together solved the problem of ‘points’ or ‘division of stakes’. In 1657, Christian Huygens discussed the Pascal –De Fermat debate, in De rationiciis in ludoaleae

  4. Daniel Bernoulli (1700-1782) Swiss physician, doctor and mathematician. Formalised subjective view of probability, decision making and risk. Introduced concept of EXPECTED UTILITY in 1738 in historic paper published in St Petersburg Used the St PETERSBURG PARADOX to justify maximising expected utility. (where the expected reward from the specified betting scheme is infinite,but most punters would only want to place a small bet on the outcomebecause of the highprobability of a low return) Daniel Bernoulli

  5. Educated (from age 12) at University of Edinburgh Sceptical views about causality in 1739-41 trilogy between 1723 and 1725 Questionable cause fallacy----The false assumption that correlation proves causality Subjective probability discussed in Ch 6 of his 1748 book Author of is-ought problem or Hume’s guillotine Significant difference between descriptive statements (about what is)and prescriptive statements (about what ought to be) Not obvious how to get from descriptive statements to prescriptive ones Hume’s Law: You can’t derive an ought from an is David Hume F.R.S.E (1711-1776)

  6. “A midget on the shoulders of giants like Hume and Huygens’ (Tom Leonard, 2014) Studied for Presbyterian Ministry at University of Edinburgh between 1719 and about 1722. Probably derived continuous version of ‘Bayes’ Theorem’ during the 1740’s while a wealthy, well-connected minister in Tunbridge Wells, with a serious demeanour and happy disposition. The Notebook of Thomas Bayes (1747-1760) contains a section on probabilities. In his tract In defence of Isaac Newton (1736, printed by John Noon), sold for a shilling, Bayes writes, To suspect Isaac Newton of the mean design of seeking reputation among the ignorant by venting unintelligible notions, and defending them by artful cunning and cunning artistry, is what no man is capable of doing. Rev. Thomas Bayes (1701-1763)

  7. Moral philosopher, inductive thinker, and political activist in support of American Revolution. In 1763, Richard Price published Bayes’ paper ‘An Essay towards solving a Problem in the Doctrine of Chances’, posthumously, in the Proceedings of the Royal Society of London. Bayes solved a complicated ‘Ball tossing problem’ involving n non-independent trials and with applications in life assurance. His mathematical solution was brilliant, but counterintuitive. *** Rev. Richard Price F.R.S.(1723-1791)

  8. He posed this as a special case of: • Obscurely Worded General Problem: • Given the number of times (n) an unknown event has happened and failed, REQUIRED the chance that the probability (ξ) of its happening in a single trial lies somewhere between any two degrees of probability that can be made? • A further special case (n=50 independent Bernoulli trials---see Bayes Appendix): • If you fail to win a lottery on n=50 occasions, with equal chance ξ of winning on reach occasion, then what is the chance that you probability ξ of winning it on the 51st attempt lies between 0.001 and 0.01?

  9. VERY SPECIAL CASE (n=1) If a mother’s first baby is a girl, then what is the chance that the probability ξ that her second baby is a boy lies between 0.5 and 1? Note that probability (girl on first birth, given ξ ) = 1-ξ Therefore LIKELIHOOD FUNCTION OF ξ is L (ξ, given girl on first birth) = 1-ξ for 0< ξ <1 In general, the likelihood of the unknown parameters is the assumed sampling density or probability mass function of the observations but expressed as a function of the unknown parameters, given the observations actually observed. A young Bayesette

  10. Initiated the ‘Savageous’ philosophy of Bayesian Statistics THE BAYESIAN PARADIGM Posterior information=Prior Information + Sampling Information. ($$$) A Bayesian is somebody who tries to represent his prior information about ξ by a probability distribution on ξ BAYES THEOREM (Continuous case): POSTERIOR DENSITY = K x PRIOR DENSITY x LIKELIHOOD where K can be calculated by noting that posterior density integrates to unity across the parameter space. However, in his 1763 paper, Bayes assumed a uniform prior distribution on (0,1) for ξ, in which case POSTERIOR DENSITY=K x LIKELIHOOD LEONARD ‘JIMMIE’ SAVAGE (1917- 1971)

  11. POSTERIOR DENSITY OF PSI In preceding very special case, Posterior density of ξ , given girl on first birth = (1-ξ)/2 (0<ξ<1) (*) • Posterior mean of ξ =predictive probability that next baby is a boy= 1/3 • and • P (0.5 <ξ <1, given girl on first birth) =1/4 If first n babies are girls, then predictive probability that next baby isa boy is 1/(n+2) DENSITY PSI

  12. French Astronomer, Mathematician, and Politician Minister in Napoleon’s Government FOUNDING FATHER OF BAYESIAN STATISTICS AND DATA ANALYSIS In 1774, his Memoir on the Probability of the Causes of Events Included a Bayesian analysis of the causes of events. In 1812, his Analytic Theory of Probabilities contained a number of detailed statistical analyses. He introduced a general version of Bayes’ theorem that includes the discrete and multiparameter cases. Applied it to ANALYZE DATA in celestial mathematics, MEDICAL STATISTICS, reliability and jurisprudence. Developed LAPLACE’S APPROXIMATION to multi-dimensional integrals And LAPLACE TRANSFORMATIONS (moment generation functions) Le Marquis Pierre’ Simon de Laplace (1749-1827)

  13. Scottish moral philosopher and leading political economist. The Wealth of Nations , 1776 Rejected the idea that: Demand must be related to utility i.e. the more useful a thing is, and the more satisfaction it gives, the more people would be willing to pay for it. THE PARODOX OF DIAMONDS AND WATER Water is necessary for life, and yet very cheap Diamonds have little utility, and are yet very costly. Smith thereby concluded that willingness to pay is not related to utility. Adam Smith proposed using interval bounds for probabilities, rather than precisely specified subjective probabilities Adam Smith (1723-1794)

  14. British philosopher, jurist and social reformer. Regarded by some as the father of modern utilitarianism, and by others, in the context of banking, insurance, and speculation, as the founder of the subjectivist, Bayesian approach to decision making. (Bentham’s approach to subjective probability is an earlier version of the exact, linear approach recommended as being rational by Tversky and Kahnemann). Introduction to Principles of Morals and Legislation, 1780 GREATEST HAPPINESS PRINCIPLE: It is the greatest happiness of the greatest number which is the principle of right or wrong. Classification of 12 pains and 14 pleasures by which we may test the happiness factor of any action. Formalised set of criteria for measuring the extent of pain or pleasure that any decision will create.Reviewed concept of punishment, and whether a particular punishment will create more pain or pleasure for society. Bentham applied similar ideas to monetary economics. Jeremy Bentham (1748-1832)

  15. Anglo-Indian mathematician, statistician and spiritualist.Appointed to Chair of Mathematics at University of London (later UCL) in 1838 See his Essay on Probabilities (1838) De Morgan further developed Bayes’s and Laplace’s approach to INVERSE PROBABILITY... • Posterior probabilities when the priordistribution is uniform. • Somewhat arbitrary e.g. a uniform prior for • a non-linear transformation of the parameter will give different posterior. • Uniform priors over on continuous unbounded parameter space are improper, but can, though not always, yield meaningful proper posteriors. De Morgan sought to justify uniform prior by Laplace’s Principle of Insufficient Reason Augustus De Morgan (1806-71)

  16.  Florence Nightingale (1820-1910) Nurse and statistician • For remainder of 19th century • (A) Many statistical scientists (e.g. Gauss, Edgeworth, Galton) thought Bayesian • (B) Inverse probabilities remained the main methodology for statistical Inference. Fisher dabbled with then in the early 20th century and discarded them because of the arbitrariness in the choice of uniform prior. • (C) Emphasis seemed to shifted somewhat to numerical and graphical summaries of data. • e.g. London Cholera epidemic map (1832) and Crimean War (Florence Nightingale, e.g. pie charts)

  17. English geneticist, statistician and polymath, a truly great man of science In 1877 built machine called GALTON QUINCUNX Used simulations while attempting to calculate posterior distribution Galton encouraged use of Bayes Theorem Informative conjugate analysis for normal distribution developed around that time. Sir Francis Galton (1822-1911)

  18. American philosopher, logician, mathematician and scientist. ‘The father of pragmatism’ Emphasised that objective statistical conclusions can only be hoped for if the data result from a randomised experiment. Was the first scientist to elicit subjective probabilities in experimental psychology.

  19. French Military Officer 1894 TRIAL OF MILLENIUM Dreyfus tried for treason Bizarrely justified subjective ‘probability’ of forgery. Falsely convicted of transmitting military secrets to Germany. Probability related to possible coincidences concerning frequencies of symbols in the code. ‘SIMILAR PROBLEMS OCCUR TODAY WHENEVER STATISTICAL EVDENCE AND SUBJECTIVE PROBABILITIES ARE INTRODUCED INTO EVIDENCE’ David H. Kaye, Minnesota Law Review (2007) O.J. Simpson murder case, Adam’s Rape Case, Sally Clark Cot Death Case See also D.H. Kaye (2010) DNA identification and the threat to civil liberties. Yale University Press Alfred Dreyfus 9 October 1859 – 12 July 1935)

  20. British mathematician, philosopher and economist 1926 papers on subjective probability and utility were encouraged by the economist John Maynard Keynes His work on subjective probability and its elicitation satisfied Charles Peirce’s empirical test. Used by experimental psychologists and recognised in 1944 by Von Neumann and Morgenstern, in their book The Theory of Games and Economic Behaviour Famously used utility theory to judge ‘how much of its wealth a nation should spend’ Close friend of philosopher Ludwig Wittgenstein whose works he translated Never stay up on the barren heights of cleverness, but come down into the green valleys of silliness Frank Ramsey (1903-1930)

  21. Highly eccentric English statistician, evolutionary biologist, geneticist and eugenics One of the chief architects of neo-Darwinian synthesis Galton Professor of Eugenics at UCL (1933-43) Argued with Karl Pearson e.g, about who should teach which course. Dabbled with Bayesian inference and inverse probability, then argued vehemently against it because of its dependence on prior e.g. the choice of ‘vague’ so-called ignorance prior. Introduced FIDUCIAL INFERENCE in paper in Annals of Eugenics (1935).Disputed by Neyman and shown by Lindley in 1958 to violate Kolmorogov’s addition laws of probability. • Sir Ronald Fisher (1990-1962)

  22. Baron Keynes of Tilton Cambridge Economist Employed expected utility in 1936 in Chapter 12 of The General Theory of Employment, Interest and Money. Keynesian Economics has fundamentally affected the theory and practice of modern macroeconomics, and influenced the policies of governments, until about 1979, until the ideas of Milton Friedman, who also used expected utility, took over. John Maynard Keynes (1883-1946)

  23. Cambridge-based Mathematician, Statistician, Geologist and Astronomer The Theory of Probability (1939) Precursed Anglo-American Bayesian Revival of 1960s Led by Rudolf Kalman, Raiffa and Schlaifer, Mosteller and Wallace, Box and Tiao, John Aitchison F.R.S.E and Dennis Lindley. INCLUDED: Invariance priors---Vague priors which refer to the determinant of Fisher’s Information and yield posterior distributions which are invariant under non-linear transformations of the parameters. Approximate Bayes intervals (also approximate confidence intervals) centred on the maximum likelihood estimate, which also refer to the likelihood dispersion. Sir Harold Jeffreys F.R.S. (1891-1989)

  24. Pre-eminent Russian Mathematician and Probabilist Introduced concept of Bayesian sufficiency in his paper on the statistical estimation of the law of Gauss in !942 in URSS Bulletin of the Academy of Sciences. Kolmogorov’s Extension Theorem constrains us to only defining our probability distributions on measurable subsets of the parameter space or sample space (i.e. those which are elements of an appropriate sigma-field, such as a Borel field) AndreyKolmogorov (1903-1987)

  25. Alan Turing (1912-1954) Irving Jack Good (1916-2009 ) Alan Turing: Gay icon and martyr, father of machine intelligence, modern computer science and artificial intelligence. Also the father of modern Bayesian applied statistics. Jack Good: cryptanalysist, mathematician, statistician and philosopher. While solving the Nazi codes at Bletchley Park, Turing and Good used various pioneering, effectively Bayesian procedures including • Empirical alternatives to Bayes factors as measures of evidence • Effectively Bayesian sequential analysis and decision-tree analysis • Shrinkage estimators for multinomial cell probabilities, which smooth the relative frequencies of the letters in the German code towards a common value,

  26. Thomas Tallis 1988-NotDeadYet Adam Empirius Logan "If Bayesians live to be a hundred they think they think they've got it made, Very few people die past that age." If we deduce that knowledge comes from irrationality and out of rationality comes rationality then we must  also  deduce that most of our conventional knowledge derives from the senses and that every rational saying is a pragmatic lie (Adam Logan, Farewell Halcyon Days, 2013)

More Related