420 likes | 563 Views
Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics. Tom LaBone. Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011. “There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain.
E N D
Lies, Damned Lies, and Health PhysicsSomeRandom Comments About Statistics in Health Physics Tom LaBone Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011
“There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain “It is easy to lie with statistics.” “It is hard to tell the truth without statistics." Andrejs Dunkels
Today • Informal, mostly apocryphal discussion of • what statistics really is, • who practices statistics and how they do it, and • why all of this is important to you as a health physicist • Main message of talk • A good working knowledge of statistics is essential in any endeavor where data are collected and analyzed (e.g., health physics) • Everyone in the room should become a statistician (of sorts) • No math is used in this presentation and no health physicists were harmed during its preparation
Health Physics and Statistics • Some HP “stat” books I used in school • G. F. Knoll Radiation Detection and Measurement 1st Edition 1979 • J. Shapiro Radiation Protection 1nd Edition 1972 • H. Cember Introduction to Health Physics 1st Edition 1969 • R. D. Evans The Atomic Nucleus 1955 • P. R. Bevington Data Reduction and Error Analysis for the Physical Sciences 1st Edition 1969 • Statistics was a tool, a “wrench to turn a nut” • Is that all it is?
What is Statistics? “Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent.” Carl Sagan in Contact
Signals and Noise • Useful information comes to us in the form of signals that form distinct patterns • The signals are contaminated with varying degrees of noise, which can make it difficult to see the signal
Seeing Patterns • In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did exist • That noise in the grass – is it just the wind or is it a lion? • So, we as a species got very good at seeing patterns, even in the absence of a signal
Apophenia • Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data • What do you see below?
Face on Mars Viking 1 Orbiter Mars Global Surveyor
Statistics is … • … a science that helps us to differentiate signal from noise and make decisions with a known probability of being wrong • … a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs • … based on the idea that variability and noise are natural and unavoidable • … a relatively modern science that is actively evolving • especially since cheap, powerful computers became available
Really, What is Statistics? “Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation” Chris Chatfield Problem Solving: A Statistician’s Guide
Exploratory Data Analysis • Look at data (usually with graphics) and use our ability to see patterns in the data to • Suggest hypotheses to test • Assess validity of assumptions on which statistical inference will be based • Support the selection of appropriate inferential tests • Suggest ideas for further data collection
Air Filters Fecal Samples
Confirmatory Data Analysis • Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion • Is the material on the filters the same material that is in the fecal samples? • Are the Pu-239 to Am-241 ratios in the fecal samples and air samples the same once we account for random noise?
Fecal Samples 2 95% CI = (1.33, 1.46)
Data Dredging • Are the two Pu-239 to Am-241 ratios the same? • If this question was asked before we saw the data we can proceed with the test to answer it • If this question was inspired by the data then we should not test the same data to get the answer • Referred to as data snooping, data dredging, etc. • Cancer clusters
Statistical Method • Define the problem • Formulate your questions in such a way that unambiguous answers are possible • Collect data • Collect data capable of answering your question • Analyze the data • Present the results • in terms your audience can understand
Define the Problem “An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.” John Tukey "It is better to solve the right problem the wrong way than to solve the wrong problem the right way". Richard Hamming
Data Collection • Collect data that are capable of answering the question asked (Data Quality Objectives) • Designed experiments • Observational studies • Sampling • You select samples from a population in order to make inferences about the population
GIGO • The collection of data is often the most time-consuming and expensive part of a study • Reverend Bayes and all of his horses can’t fix a bum dataset
Analyze the Data • All statistical procedures have assumptions • In practice, the assumptions of any given statistical procedure are violated to some degree • Can the validity of the assumptions be verified? • Can the validity of the answer be verified? • How robust is your statistical procedure to violations of its assumptions? • Simple approximate solutions you can understand may be better than complex exact solutions that you can’t • Augment standard statistical analyses with simulations
Present Results • Technical answer versus the functional answer • “the null hypothesis is not rejected” • technically “not rejected” ¹ “accepted” • functionally “not rejected” =“accepted” • Statistical significance and practical significance • Apply “so what” test to your answers
What is a Statistician? “Powerful spirits should only be called by the master himself” Goethe The Sorcerer's Apprentice
What is a Statistician? • Based on Chatfield’s definition of statistics, anyone who makes decisions based on the analysis of data might be called a statistician • However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics • Key difference between the sorcerer and his apprentice • Contrary to what you might think, there is a lot of subjectivity and professional judgment in the practice of statistics • Statistics is vast in scope and detail, and the apprentice does not know what he does not know “It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.” Mark Twain
The Sorcerer’s Apprentice • We may not be statisticians, but we are clearly doing statistics, often without adult supervision • Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control “Should I refuse a good dinner simply because I do not understand the processes of digestion?” Oliver Heaviside [On being criticized for using formal mathematical manipulations without understanding how they worked]
How We Can be Better Statisticians • Master the basics • Learn the language • Play with your data • Use better software • Perform reproducible work • Consult with a real statistician
Master the Basics Kahn Academy http://www.khanacademy.org/
Statistics MS/Certificate Distance Programs • University of South Carolina • Colorado State University • Texas A&M University • Penn State University
Concepts and Terminology • Specialized Concepts • Population versus sample for example • Statistics has a very precise language all its own • “the null hypothesis is not rejected” • “not rejected” ¹ “accepted” • Questions and answers are not right unless you use the proper language to convey the proper concept • some statisticians can be intolerant of laymen who misuse the language of statistics • Learn to phrase questions and interpret answers properly
Exploratory Statistics • Learn to play with your data and see if it is trying to tell you something new • Study graphs of your data “There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.” John Tukey
Software used for Statistics • I use the following software for statistical calculations (in order of usage) • R • Minitab • SAS • Spreadsheet (e.g., MS Excel, Gnumeric) • There are many others
Spreadsheets (Excel) • What some people can do in Excel is nothing short of amazing (but should they be doing it?) • Amarillo Slim beat tennis champ Bobby Riggs at Ping-Pong, using a frying pan instead of a paddle • Spreadsheet Addiction by Patrick Burns • http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html • Problems with spreadsheet implementation • Excel has a long history of doing bad stats • Problems with spreadsheet paradigm • Reproducible science
http://www.msnbc.msn.com/id/21033161/from/RS.1/ 9/28/2007 M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010
Reproducible Research • Reproducible research refers to the idea that the ultimate product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results Raw Data Data Massaging Calculations Plots and Tables Final Paper
The R Project forStatistical Computing • R is a language and environment for statistical computing and graphics • R is available as Free Software under the terms of the GNU General Public License in source code form • It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS • Download from http://www.r-project.org/
Advantages of R • Command line interface rather than a GUI • Promotes reproducible statistics • Open source • Flexible licensing • Availability of source code for peer review • Bugs are public knowledge and are fixed quickly • New tests and methods tend to appear first in R • Many dozens of recently published books devoted to R • Free (and very good) community support available
Consult with a Statistician • If you are going to involve a statistician, do it at the study design and data collection phases • If not, at least estimate how much it will cost to collect the data all over again • Anybody can analyze compelling data “To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.” Sir Ronald Fisher
Twisted Answers to Crooked Questions • As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis • In such situations we base our decisions on professional judgment, often augmented with “statistics” • We must not fool ourselves about what we are doing • … of all the wrong answers we have to choose from, this one is the best • We have no right to expect a statistician to endorse such mischief
The Apprentice Should Beware of … • The Management Prior • Being bamboozled by other people’s statistics • “The only right way to do this is X [insert statistical method here]” • Being seduced by complexity
Statistics in the Workplace: Musings of a Sorcerer's Apprentice Presentation to USC Stat Club March 26, 2009 • Main message • A degree in statistics is a “Swiss Army Knife” that is very useful in any endeavor where data are collected and analyzed • Everyone in the room should become a health physicist (I had no takers)