1 / 55

The History of

30 Years. The History of. Keysteps of Computational Statistics. Wilfried Grossmann, University of Vienna, Austria Michael G. Schimek, Medical University of Graz, Austria Peter Paul Sint, Austrian Academy of Sciences, Vienna. 1974.

Download Presentation

The History of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 30 Years The History of Keysteps of Computational Statistics Wilfried Grossmann, University of Vienna, Austria Michael G. Schimek, Medical University of Graz, Austria Peter Paul Sint, Austrian Academy of Sciences, Vienna

  2. 1974 Department of Statistics and Informatics, University of Vienna Peter Paul, a „senior“ Assistant Professor

  3. Department of Statistics and Informatics, University of Vienna 1974 A few years after Wilfried, a „junior“ Assistant Professor

  4. University of Vienna A few years after 1974 Gerhard Bruckmann Michael, a first year student

  5. Outline of Presentation The Beginning of COMPSTAT • Early statistical computing • The institutional environment • The first symposium and the Compstat Society Developments in Computational Statistics (CS) • CS and statistical theory • CS and algorithms • CS and computer science • CS and application The COMPSTAT Symposia

  6. The Beginning of COMPSTAT

  7. Early Computational Statistics • The Beginnings in Vienna • Institute of Statistics • Part of the Law Faculty - S. Sagoroff - Leipzig/Sofia/USA/Berlin//Vienna - Energy Balances • first Computer: first generation machine • Paid for by Rockefeller-Foundation 1960 • Arrival of the ‚Electronic Brain‘ 1st generation • Never again similar enthusiasm • Institute of Advanced Studies - Ford Institute • Statistical machines - card counting - >2nd generation • Replaced by IBM /360-44 - 3rd gen. SSP / SPSS • Computing Center

  8. Statistics-Computational One year Biostatistics department Oxford University Still: Not strongly integrated in international statistical community - Main contacts ISI: Central Statistical Office, Sagoroff 1973 ISI-session in Vienna - emphasis on applications - computational methods rare Bring statisticians with our interests to Vienna Encouragement by publisher Arnulf Liebing /Physica/ What is specific to our department? Concept of Computational Statistics - Johannes Gordesch (Math) - Peter Paul Sint (Physics)

  9. First COMPSTAT Call • COMPSTAT 1974 • Gerhart Bruckmann • - Local fame as analyst of voting results during election nights • Leopold Schmetterer (successor of Sagoroff) • - Internationally known Mathematical Statistician • (Franz Ferschl, incoming professor of statistics, new editor of Metrika - added as an editor by the publisher)

  10. S. Sagoroff and M. Tantilov

  11. First COMPSTAT Editors

  12. Preface of the first Proceedings

  13. Logic of the Logo

  14. J. Gordesch at Compstat76 Berlin

  15. Getting of Age • International from the start • Compstat Society since Berlin • Leiden NL 1978 Integration into IASC • Edinburgh GB 1980 - Toulouse F 1982 • Eastern Europe needed Politics ISI-IASC • Local Projects redirected: Prague 1984 • Rome I 1986 - Copenhagen 1988 DK • Dubrovnik YU 1990 - Neuchâtel CH 1992

  16. Prague 1984

  17. Developments in Computational Statistics

  18. Computational Statistics • What is Computational Statistics? • A question raised many times at the end of the 80ies and beginning of the 90ies inside the community

  19. Computational Statistics • Working definition (A. Westlake) Computational Statistics is related tothe advance of statistical theory and methods through the use of computational methods. This includes both the use of computation to explore the impact of theoriesand methods, and development of algorithms to make these ideas available to users

  20. Computational Statistics Statistical Theory Numerical Analysis Algorithms Seminumerical Algorithms Computational Statistics Modelling Computer Science Applications Statistical Software

  21. Computational Statistics and Statistical Theory • The statistical journey in the 20th century • The Theory Era • The Methodology Era

  22. Computational Statistics and Statistical Theory • The statistical journey in the 20th century • B. Efron: Statistics in the 20th century is a journey between three poles: • Applications • Mathematics • Computation

  23. Computational Statistics and Statistical Theory • The Theory Era (Pearson, Neyman, Fisher, Wald) • From models for solving practical problems towards a mathematical decision theoretic framework • Based on optimality principles • Application is based on computations feasible for paper and pencil or mechanical computing devices

  24. Computational Statistics and Statistical Theory • Modelling Era (1) • Tukey’s paper about the future of data analysis (1962) as a turning point from mathematics towards computation • Confirmatory versus explanatory analysis • Dynamics of data analysis • “Robustness” • Importance of Graphics

  25. Computational Statistics and Statistical Theory • Modelling Era (2) • Important developments in the modelling era • Nonparametric and Robust Methods • Kaplan-Meier and Proportional Hazards • Logistic Regression and GLM • Jackknife and Bootstrap • EM and MCMC • Empirical Bayes and James-Stein Estimation

  26. Computational Statistics and Statistical Theory • Modelling Era (3) • The modelling area is characterized by a strong interplay between statistical theory and computational statistics • The computer as a workbench for statistical experiments (going back to v. Neumann and S. Ulam) • Passive usage: Studying feasibility of statistical theory by simulation • Active usage: Obtain results which cannot be computed by conventional numerical algorithms

  27. Computational Statistics and Statistical Theory • COMPSTAT was probably not always at the frontier of this developments but the programs and the proceedings reflect quite well the dynamics of the subject in the Modelling Era

  28. Computational Statistics and Algorithms • Numerical Algorithms • Matrix Computation, Optimization • Random Numbers / Monte Carlo • Semi-numerical Algorithms • Sorting, Searching, Combinatorial Methods, Graph Theoretic Algorithms,… • Graphical Algorithms • Symbolic Computation (?) • Mathematical vs. Statistical Modelling

  29. Computational Statistics and Algorithms • Statistics and Numerical Algorithms (1) • Fast Fourier Transform (Tukey) • Recursive Algorithms and Filtering (Kalman Filter) (Both topics seem to be not core topics in computational statistics)

  30. Computational Statistics and Algorithms • Statistics in Numerical Algorithms (2) • Adaptation of optimization techniques (e.g. scoring methods) • Behaviour of optimization methods in statistical context (numerical convergence vs. stochastic convergence concepts) Implicit Consideration at COMPSTAT

  31. Computational Statistics and Algorithms • Statistics and Random Numbers / Monte Carlo • Generation of Random numbers was (and is) probably more a topic of mathematics (number theory) and computer science • In the beginning of COMPSTAT there was also some connection to simulation • Genuine application of Monte Carlo Methods in connection with new developments of statistical theory (e.g. MCMC)

  32. Computational Statistics and Algorithms • Statistics and semi-numerical algorithms • Applications in context of nonparametric statistics and analysis of tabular data • Feasibility of conditional inference for logistic models • New developments on the borderline between statistics and computer science • Data Mining as a new statistical modelling paradigm COMPSTAT was open towards these developments and integrated it into the program

  33. Computational Statistics and Algorithms • Statistics and Graphical Algorithms • Development rather complementary to the developments of computer science, • Important issues (L. Wilkinson): • Graphics are not only a tool for displaying results but rather a tool for perceiving relationships • Dynamic graphics as important tool for data analysis • Graphics are a means of model formalization reflecting quantitative and qualitative traits of its variables Represented quite well at COMPSTAT

  34. Computational Statistics and Algorithms • Mathematical vs. Statistical Modelling • Emphasis on different methods (e.g. Differential Equations) • Different modelling environments (J. Nelder) • Data structures in statistics • Exploratory nature of statistical analysis (statistical analysis cycle) • Competence of users

  35. Computational Statistics and Computer Science • Developments in Statistical Software • Development of Statistical Languages • Developments in Statistical Database Management

  36. Computational Statistics and Computer Science • Developments in Statistical Software (1) • From numerical subroutines towards statistical packages • Main goals: • Taking into account the peculiarities of statistical data analysis • Usage of actual hardware developments

  37. Computational Statistics and Computer Science • Developments in Statistical Software (2) • COMPSTAT was from the beginning onwards an important forum for the development of statistical software • The proceedings in the beginning of the eighties show numerous software developments for specific statistical models • There was always some tension in connection with presentation of commercial software developments and the scientific character of the conference

  38. Computational Statistics and Computer Science • Development of Statistical Languages (1) • GLIM was probably the first genuine statistical modelling language • Present at COMPSTAT from the very beginning

  39. Computational Statistics and Computer Science • Development of Statistical Languages (2) • The S language set up a new paradigm for computing which is of interest also outside statistical applications • Contribution in Computer Science honoured by the ACM Software System Award for J. Chambers Also it started already in 1976 it took a long time to enter the COMPSTAT community

  40. Computational Statistics and Computer Science • Development of Statistical Languages (3) • R got rather fast popularity inside COMPSTAT due to free availability and effective organisation of CRAN • Omegahat: An umbrella for open source projects in computational statistics covering not only statistical computation but also other important aspects in distributed computing

  41. Computational Statistics and Computer Science • Development of Statistical Languages (4) • XLISP-Stat as proof of concept (in particular for animated graphics) • XploRe as Java based production system

  42. Computational Statistics and Computer Science • Statistical Data Base Management • Main challenge is appropriate usage of the developments in database technology in statistical context • Combination of statistical data structures and statistical processing activities with conceptual data models • Representation of tabular data • Metadata as a tool to capture the complexity of statistical data A small but active group inside the COMPSTAT community from the very beginning

  43. Computational Statistics and Applications • Challenges for Computational Statistics Rather independent from application area • Data • Data capture • Data structures • Data size • Analysis Process • Analysis strategies • The role of the statistician in the computer age

  44. Computational Statistics and Applications • Data challenges (1) • Contributions towards data challenges occur occasionally at COMPSTAT • Actual problems • Data capture • Data capture tools are rather a side branch of computational statistics and more connected to official statistics • A new challenge are data streams which have up to now attracted not so much attention in the computational statistics community

  45. Computational Statistics and Applications • Data challenges (2) • Data structures • New problems (e.g. in connection with data mining) raise questions with respect to the applicability of the basic statistical analysis paradigm (population, sample, measurement process) • Data size • Handling huge datasets All these challenges seem to be at the moment not core topics of computational statistics

  46. Computational Statistics and Applications • Analysis process • Analysis strategies • The question of formalization of analysis strategies was a hot topic at the COMPSTAT conferences in the end of the 80ies, but there was limited success • The role of statisticians in the computer age • Is progress in computational statistics an enabler for statisticians or leads it towards a de-skilling of the statistical profession?

  47. The COMPSTAT Symposia

  48. A full set of COMPSTAT proceedings (one statistical outlier removed) Do you see the CSDA volumes in the background ? Here they are !

  49. The COMPSTAT Symposia I

  50. The COMPSTAT Symposia II

More Related