170 likes | 325 Views
Integrating Ethics into Graduate Training in the Environment Sciences Series Unit 3: Ethical Aspects of Data Analysis. AUTHOR: KLAUS KELLER and LOUISE MILITCH Department of Geosciences The Pennsylvania State University
E N D
Integrating Ethics into Graduate Training in the Environment Sciences SeriesUnit 3: Ethical Aspects of Data Analysis AUTHOR: KLAUS KELLER and LOUISE MILITCH Department of Geosciences The Pennsylvania State University With input from Nancy Tuana, Ken Davis, Jim Shortle, Michelle Stickler, Don Brown, and Erich Schienke
Guiding Questions • What are potential ethical questions arising in data analysis? • What are the “rules of the game”? • Do research publications follow these rules? • Where to go for guidance?
What are potential ethical questions arising in data analysis? • What are the impacts of potential errors in the data analysis result on the outcome of decision? • Type I error • Type II error • Overconfident projections • Biased projections • How to deal with the illusion of objectivity? • How to communicate potential overconfidence? • How to formulate the null-hypothesis? • When is the analysis “done” and ready for submission? • What to do if the data are insufficient for a formal and robust hypothesis test?
What can go wrong while testing an hypothesis? • Type 1 error: • Effect is noise but we assign significant connection. • Null-hypothesis is rejected, when it is actually true • “False positive”. • Scientists typically design statistical tests with a low probability of a type 1 error (e.g., “p < 0.05”). • Type 2 error: • Effect is real, but we do not assign a significant connection. • Null-hypothesis is accepted, when it is actually false. • “False negative”. • Optimal (or Bayesian) decision theory • Design the strategy based on the relative costs of Type I and Type II errors. • Example: A hurricane is predicted to arrive in Miami with p=0.2. Should you take action? • Maximize the utility of the decision consistent with your posterior.
Guiding Questions • What are potential ethical questions arising in data analysis? • What are the “rules of the game”? • Do research publications follow these rules? • Where to go for guidance?
American Statistical Association Ethical Guidelines for Statistical Practice “Statisticians should: • present their findings and interpretations honestly and objectively; • avoid untrue, deceptive, or undocumented statements; • disclose any financial or other interests that may affect, or appear to affect, their professional statements.” http://www.tcnj.edu/~asaethic/asagui.html
American Statistical Association Ethical Guidelines for Statistical Practice “Statisticians should: • delineate the boundaries of the inquiry as well as the boundaries of the statistical inferences which can be derived from it; • emphasize that statistical analysis may be an essential component of an inquiry and should be acknowledged in the same manner as other essential components; • be prepared to document data sources used in an inquiry, known inaccuracies in the data, and steps taken to correct or refine the data, statistical procedures applied to the data, and the assumptions required for their application; • make the data available for analysis by other responsible parties with appropriate safeguards for privacy concerns; • recognize that the selection of a statistical procedure may to some extent be a matter of judgment and that other statisticians may select alternative procedures; • direct any criticism of a statistical inquiry to the inquiry itself and not to the individuals conducting it”. http://www.tcnj.edu/~asaethic/asagui.html
Guiding Questions • What are potential ethical questions arising in data analysis? • What are the “rules of the game”? • Do research publications follow these rules? • Where to go for guidance?
What is overconfidence? Year of publication • Estimates with artificially tight confidence bounds are overconfident. • Overconfidence in subjective assessments and model predictions is common. Error in the recommended values for the electron mass Henrion and Fischhoff (1986)
What are key sources of overconfidence? • Neglecting autocorrelation effects. • Undersampling the unresolved variability (i.e., out-of-range projections). • Assuming unimodal probability density functions. • Neglecting model representation errors. • Considering only a subset of the parametric uncertainty. • Neglecting structural model uncertainty. Are current climate projections overconfident?
Morita et al (2001) The fact that the range of CO2 emission projections have widened over time is consistent with the hypothesis that previous projections have been overconfident. “The 40 scenarios cover the full range of GHG [..] emissions consistent with the underlying range of driving forces from scenario literature” [Nakicenovic et al, 2000, p.46].
Past carbon cycle projections that neglect the uncertainty in historic land-use CO2 emissions are likely overconfident { Analyses adopting a single estimate of land use CO2 emissions Analyses accounting for uncertainty about which estimate of land use CO2 emissions is most likely, given observational constraints Miltich, Ricciuto, and Keller (2007)
When might overconfidence result in biased decision-analyses? • Designing risk management strategies in the face of threshold responses requires sound probabilistic information. • Overconfident climate projection may underestimate the risks of low-probability high impact events.
Guiding Questions • What are potential ethical questions arising in data analysis? • What are the “rules of the game”? • Do research publications follow these rules? • Where to go for guidance?
Where to go for guidance? • ASA Ethical Guidelines for Statistical Practice, published by the American Statistical Association:http://www.tcnj.edu/~asaethic/asagui.html • The Online Ethics Center for Engineering and Science: http://onlineethics.org/index.html • Your mentors and peers.
Discussion Questions / Checklist • Should one submit a manuscript that may well be wrong and that could detrimentally affect the policy process? • How do you define “detrimental”? • When and how is it appropriate to exclude “data outliers”? • Are the potential sources of biases clearly flagged • Is the sensitivity to the choice of analyzed data sufficiently explained? • Does the discussion adopt a specific value judgment about what is a “significant” result? • Are there ethical issues in performing a “classic” “p<0.05” hypothesis test?
Reading Materials • L. I. Miltich, D.M. Ricciuto, and K. Keller: Which estimate of historic land use CO2 emissions makes most sense given atmospheric and oceanic CO2 observations?, preparation for Environmental Research Letters, http://www.geosc.psu.edu/~kkeller/wp/ (2007). • Keller, K., Miltich, L.I., Robinson, A. and Tol, R.S.J.: 2007, 'How overconfident are current projections of carbon dioxide emissions?' Working Paper Series, Research Unit Sustainability and Global Change, Hamburg University. FNU-124, http://ideas.repec.org/s/sgc/wpaper.html. • Berger, J. O., and D. A. Berry. 1988. Statistical-Analysis and the Illusion of Objectivity. American Scientist 76 (2):159-165. • Cohen, J. 1994. The Earth Is Round (P-Less-Than.05). American Psychologist 49 (12):997-1003. • Lipton, P. 2005. Testing hypotheses: Prediction and prejudice. Science 307 (5707):219-221.