1 / 42

Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics

Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics. Tom LaBone. Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011. “There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain.

myra-snyder
Download Presentation

Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lies, Damned Lies, and Health PhysicsSomeRandom Comments About Statistics in Health Physics Tom LaBone Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011

  2. “There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain “It is easy to lie with statistics.” “It is hard to tell the truth without statistics." Andrejs Dunkels

  3. Today • Informal, mostly apocryphal discussion of • what statistics really is, • who practices statistics and how they do it, and • why all of this is important to you as a health physicist • Main message of talk • A good working knowledge of statistics is essential in any endeavor where data are collected and analyzed (e.g., health physics) • Everyone in the room should become a statistician (of sorts) • No math is used in this presentation and no health physicists were harmed during its preparation

  4. Health Physics and Statistics • Some HP “stat” books I used in school • G. F. Knoll Radiation Detection and Measurement 1st Edition 1979 • J. Shapiro Radiation Protection 1nd Edition 1972 • H. Cember Introduction to Health Physics 1st Edition 1969 • R. D. Evans The Atomic Nucleus 1955 • P. R. Bevington Data Reduction and Error Analysis for the Physical Sciences 1st Edition 1969 • Statistics was a tool, a “wrench to turn a nut” • Is that all it is?

  5. What is Statistics? “Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent.” Carl Sagan in Contact

  6. Signals and Noise • Useful information comes to us in the form of signals that form distinct patterns • The signals are contaminated with varying degrees of noise, which can make it difficult to see the signal

  7. Seeing Patterns • In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did exist • That noise in the grass – is it just the wind or is it a lion? • So, we as a species got very good at seeing patterns, even in the absence of a signal

  8. Apophenia • Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data • What do you see below?

  9. Face on Mars Viking 1 Orbiter Mars Global Surveyor

  10. Face in Food, et cetera

  11. Face in Data

  12. Statistics is … • … a science that helps us to differentiate signal from noise and make decisions with a known probability of being wrong • … a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs • … based on the idea that variability and noise are natural and unavoidable • … a relatively modern science that is actively evolving • especially since cheap, powerful computers became available

  13. Really, What is Statistics? “Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation” Chris Chatfield Problem Solving: A Statistician’s Guide

  14. Exploratory Data Analysis • Look at data (usually with graphics) and use our ability to see patterns in the data to • Suggest hypotheses to test • Assess validity of assumptions on which statistical inference will be based • Support the selection of appropriate inferential tests • Suggest ideas for further data collection

  15. Air Filters Fecal Samples

  16. Confirmatory Data Analysis • Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion • Is the material on the filters the same material that is in the fecal samples? • Are the Pu-239 to Am-241 ratios in the fecal samples and air samples the same once we account for random noise?

  17. Fecal Samples 2 95% CI = (1.33, 1.46)

  18. Data Dredging • Are the two Pu-239 to Am-241 ratios the same? • If this question was asked before we saw the data we can proceed with the test to answer it • If this question was inspired by the data then we should not test the same data to get the answer • Referred to as data snooping, data dredging, etc. • Cancer clusters

  19. Statistical Method • Define the problem • Formulate your questions in such a way that unambiguous answers are possible • Collect data • Collect data capable of answering your question • Analyze the data • Present the results • in terms your audience can understand

  20. Define the Problem “An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” John Tukey "It is better to solve the right problem the wrong way than to solve the wrong problem the right way". Richard Hamming

  21. Data Collection • Collect data that are capable of answering the question asked (Data Quality Objectives) • Designed experiments • Observational studies • Sampling • You select samples from a population in order to make inferences about the population

  22. GIGO • The collection of data is often the most time-consuming and expensive part of a study • Reverend Bayes and all of his horses can’t fix a bum dataset

  23. Analyze the Data • All statistical procedures have assumptions • In practice, the assumptions of any given statistical procedure are violated to some degree • Can the validity of the assumptions be verified? • Can the validity of the answer be verified? • How robust is your statistical procedure to violations of its assumptions? • Simple approximate solutions you can understand may be better than complex exact solutions that you can’t • Augment standard statistical analyses with simulations

  24. Present Results • Technical answer versus the functional answer • “the null hypothesis is not rejected” • technically “not rejected” ¹ “accepted” • functionally “not rejected” =“accepted” • Statistical significance and practical significance • Apply “so what” test to your answers

  25. What is a Statistician? “Powerful spirits should only be called by the master himself” Goethe The Sorcerer's Apprentice

  26. What is a Statistician? • Based on Chatfield’s definition of statistics, anyone who makes decisions based on the analysis of data might be called a statistician • However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics • Key difference between the sorcerer and his apprentice • Contrary to what you might think, there is a lot of subjectivity and professional judgment in the practice of statistics • Statistics is vast in scope and detail, and the apprentice does not know what he does not know “It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.” Mark Twain

  27. The Sorcerer’s Apprentice • We may not be statisticians, but we are clearly doing statistics, often without adult supervision • Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control “Should I refuse a good dinner simply because I do not understand the processes of digestion?” Oliver Heaviside [On being criticized for using formal mathematical manipulations without understanding how they worked]

  28. How We Can be Better Statisticians • Master the basics • Learn the language • Play with your data • Use better software • Perform reproducible work • Consult with a real statistician

  29. Master the Basics Kahn Academy http://www.khanacademy.org/

  30. Statistics MS/Certificate Distance Programs • University of South Carolina • Colorado State University • Texas A&M University • Penn State University

  31. Concepts and Terminology • Specialized Concepts • Population versus sample for example • Statistics has a very precise language all its own • “the null hypothesis is not rejected” • “not rejected” ¹ “accepted” • Questions and answers are not right unless you use the proper language to convey the proper concept • some statisticians can be intolerant of laymen who misuse the language of statistics • Learn to phrase questions and interpret answers properly

  32. Exploratory Statistics • Learn to play with your data and see if it is trying to tell you something new • Study graphs of your data “There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.” John Tukey

  33. Software used for Statistics • I use the following software for statistical calculations (in order of usage) • R • Minitab • SAS • Spreadsheet (e.g., MS Excel, Gnumeric) • There are many others

  34. Spreadsheets (Excel) • What some people can do in Excel is nothing short of amazing (but should they be doing it?) • Amarillo Slim beat tennis champ Bobby Riggs at Ping-Pong, using a frying pan instead of a paddle • Spreadsheet Addiction by Patrick Burns • http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html • Problems with spreadsheet implementation • Excel has a long history of doing bad stats • Problems with spreadsheet paradigm • Reproducible science

  35. http://www.msnbc.msn.com/id/21033161/from/RS.1/ 9/28/2007 M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010

  36. Reproducible Research • Reproducible research refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results Raw Data Data Massaging Calculations Plots and Tables Final Paper

  37. The R Project forStatistical Computing • R is a language and environment for statistical computing and graphics • R is available as Free Software under the terms of the GNU General Public License in source code form • It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS • Download from http://www.r-project.org/

  38. Advantages of R • Command line interface rather than a GUI • Promotes reproducible statistics • Open source • Flexible licensing • Availability of source code for peer review • Bugs are public knowledge and are fixed quickly • New tests and methods tend to appear first in R • Many dozens of recently published books devoted to R • Free (and very good) community support available

  39. Consult with a Statistician • If you are going to involve a statistician, do it at the study design and data collection phases • If not, at least estimate how much it will cost to collect the data all over again • Anybody can analyze compelling data “To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” Sir Ronald Fisher

  40. Twisted Answers to Crooked Questions • As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis • In such situations we base our decisions on professional judgment, often augmented with “statistics” • We must not fool ourselves about what we are doing • … of all the wrong answers we have to choose from, this one is the best • We have no right to expect a statistician to endorse such mischief

  41. The Apprentice Should Beware of … • The Management Prior • Being bamboozled by other people’s statistics • “The only right way to do this is X [insert statistical method here]” • Being seduced by complexity

  42. Statistics in the Workplace: Musings of a Sorcerer's Apprentice Presentation to USC Stat Club March 26, 2009 • Main message • A degree in statistics is a “Swiss Army Knife” that is very useful in any endeavor where data are collected and analyzed • Everyone in the room should become a health physicist (I had no takers)

More Related