390 likes | 558 Views
The Long Way from α-Control to Validity Proper: Problems with the Current Debate on Bad Practices. Klaus Fiedler (University of Heidelberg). Preview Innuendo: Politician X not seen in the red-light quarter. We did not violate the rules of good science.
E N D
The Long Way from α-Control to Validity Proper: Problems with the Current Debate on Bad Practices Klaus Fiedler (University of Heidelberg)
Preview • Innuendo: Politician X not seen in the red-light quarter. We did not violate the rules of good science. • Unfortunate confusion of two fundamentally different issues: (1) Lack of replicability and (2) Cheating … leading to a cannibalistic and discouraging debate • Non-replicability – Both a deficit and an asset of science … a pre-condition for all scientific creativity • Reducing methodology to statistical significance testing • What is needed is a new methodology that is sensitive to the logic of scientific discovery, within which NHST is only a subordinate tool.
Diagnosis continued … • The new compliance culture … Surveillance in science … • … that primarily aims at minimizing α-errors in publication decisions. Without pertinent evidence, it is pretended that α-errors are most expensive and consequential. Hardly realized that -errors may be the more serious enemy of science • Should new ideas, before they are established and tested powerfully, really not be published and shared by other scientists? • Does it make sense to maximize the replicability of findings that lack validity? … The fiction of exact replicability • Is replicability an issue peculiar to behavioral science … psychology … social psychology … priming paradigm?
Loosening Tightening George Kelly (1955): „The creative cycle“ An evolutionary perspective on scientific growth
Loosening Random variation: Recombination, mutations Evolution Strict selection Tightening George Kelly (1955): „The creative cycle“ An evolutionary perspective on scientific growth
Loosening Context of discovery: hypothesis creation Scientific progress Conext of justification: hypothesis testing Tightening George Kelly (1955): „The creative cycle“ An evolutionary perspective on scientific growth
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12, 129-140. Please try to predict the next element in the series: 2
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12129-140. Please try to predict the next element in the series: 2 4
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12129-140. Please try to predict the next element in the series: 2 4 8
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12129-140. Please try to predict the next element in the series: 2 4 8 16
Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12129-140. Please try to predict the next element in the series: 2 4 8 16 32
Y = superlinear increase Y = 2N Y = any symbols Y = numbers Y = increasing numbers
Salience of existential topic Mortality salience Mere priming of incomple- teness Incom- pleteness Reminder of uncontrolled forces
What does this mean? • Don‘t over-estimate the role of statistical tests in this process of scientific discovery • Statistical tests are nothing but a useful heuristic cue for screening potentially interesting findings • Significance tests (whether Fisherian or Bayesian) do not allow us to make inferences about the validity or invalidity of a theory • Proper (i.e., conservative) statistical tests have hardly ever led to fundamentally new insights, or progress • Statistically tested hypotheses are based on arbitrary operationalizations of the independent and dependent variables
Wason (1966) once more … on Popper‘s lesson If p, then q … does not inform any inferences of the form: If not-p, then ? Likewise: If p, then q, does not have strong implications for If q, then ? If q is actually borne out in a study guided by If p, then q, this does not mean that the theory is valid because other causes than p might have been present that actually produced q. By the same token, if a study guided by the theory If p, then q failed to produce q, this does not invalidate the theory, because the impact of p on q might have been overridden by other causal factors working in opposite directions
Subjective Value Losses Gains Don‘t join the witch hunt! – Let‘s not talk about psychology … social psychology … priming. Let‘s talk about very best products … awarded with a Nobel price … Prospect theory Choose between: A: 1000 € with p = .09 B: 100 € with p = .90
Prospect theory (Kahneman & Tversky, 1978) Risk-aversive vs. Risk-seeking decisions Risk-aversive vs. risk-seeking decisions
Prospect theory (Kahneman & Tversky, 1978) Risk-aversive vs. Risk-seeking decisions Type of decision task: choice vs. pricing Risk-aversive vs. risk-seeking decisions
Prospect theory (Kahneman & Tversky, 1978) Risk-aversive vs. Risk-seeking decisions Type of decision task: choice vs. pricing Risk-aversive vs. risk-seeking decisions Subtle linguistic and conver-sational cues
Prospect theory (Kahneman & Tversky, 1978) Risk-aversive vs. Risk-seeking decisions Type of decision task: choice vs. pricing Risk-aversive vs. risk-seeking decisions Subtle linguistic and conver-sational cues Decisions based on description vs. experience
Another „classical“: The Stroop Effect Trial type: Color match vs. non-match Affective cues as moderators (Kuhl & Cazèn) Stroop interference Statistical contingencies in design Cognitive load as moderator (Dany Algom)
Hardly contestable priming results Trial type: related vs. unrelated prime-target pairs Affective states: Positive vs. negative mood Semantic and evaluative priming facilitation Contingency or pseudoconting. in design Attention to primes and ecoding style
Measure Y = f(X) • Measure Y = f(X | X‘, X‘‘, X‘‘‘ …) • Measure Y = f(X | X‘, X‘‘, X‘‘‘ … C1, C2, C3, C4, …) • The list of relevant other causes (X‘, X‘‘, X‘‘‘ …) and boundary conditions (C1, C2, C3, …) can be very long • They can be subtle and hard to recognize • Scientific progress comes from these sources of non-replicability !!!
Michael I. Norton (Harvard Business School), Jeana H. Frost (Boston University), & Dan Ariely (Massachusetts Institute of Technology). Less Is More: The Lure of Ambiguity, or Why Familiarity Breeds Contempt. JPSP, 2007, 92, 97-105. Participants saw either 4, 6, 8, or 10 traits that had been randomly drawn from the set of 28 and then rated how much they thought they would like the individual described by these traits The 28 traits were taken from Asch (1946), Edwards and Weary (1993), and Pavelchak (1989): ambitious, boring, bright, critical, cultured, deliberate, dependable, emotional, enthusiastic, idealistic, imaginative, impulsive, individualistic,industrious, intelligent, level-headed, methodical, observant,open-minded, opinionated, polite, reliable, resourceful, self-disciplined,sensitive, stubborn, studious, and talkative
Implications • Both sets of studies were n o t underpowered • A deliberate and careful attempt was made to run an exact replication … but with divergent results !!! • What does this mean? Is it a case for Bem? Was the random generator not functioning well? Were some of the researchers cheating? • The answer is a three-fold NO! And the explanation is that exact replication is fictionary • There are always subtle other causes and boundary conditions, and unraveling these boundary conditions is the most challenging and creative part of science … • It won‘t be done by any statistical procedure !!!
What is really needed, instead, is: • Distinction between idealized theories and their application to natural task settings • Theories that explicate boundary conditions • Theories that are explicit concerning the constraints they impose on the empirical world • Theoretical frameworks that are organized from general to specific and that let us understand the latitude for boundary conditions and alternative causes • Open controversies that are not hidden in the closet of journal reviewing and that render replications likely and pragmatically meaningful
Falsifying data Falsely claiming that results are unaffected by demographics Claiming to have predicted an unexpected result Excluding data after looking at the impact of doing so Selectively reporting studies that “worked” Rounding down p values Stopping data collection after achieving the desired result Failing to report all conditions Collecting more data after seeing whether results were significant Failing to report all dependent measures Convergent percentage estimates for ten bad practices studied by John et al. (2012) when completely different statistical reference sets should result in highly discrepant estimates
There is another issue that I am a bit concerned about, which is reflected in … a stream of related papers that have been published in the past couple of years. Many of these papers, while unquestionably pointing out to important methodological concerns, portray the average researcher as if he/she is intentionally trying to deceive the scientific community. Put differently, implicitly these papers suggest that the starting point should be mistrust in our colleagues unless they can unequivocally demonstrate that they are trustworthy. The publication bias, prejudice against the null hypothesis, cherry picking are by all means poor and undesired habits that hinder science and we should do our best to get rid of them. However, most of these questionable practices are not done necessarily with a conscious intention of deception. It is more the case that many researchers are not aware of the consequences of these malpractices. This is fundamentally different from what researchers like Stapel and Smeesters have done. I know I am touching on a sensitive issue, but I still want to believe that most of my colleagues have some fundamental integrity. If our starting point will be that the other is untrustworthy unless he/she can prove that they are trustworthy we will not get far. As noted by Elster, trust (even if not always justified) is the necessary lubrication for the working of a community (including the scientific one). … authors … know what cost benefit analysis is all about. If I try to assess all the cumber-some procedures and precautions that are proposed in order to discourage the use of unwarranted research practices, I wonder sometimes whether we may not be paying a too high price.
There is another issue that I am a bit concerned about, which is reflected in … a stream of related papers that have been published in the past couple of years. Many of these papers, while unquestionably pointing out to important methodological concerns, portray the average researcher as if he/she is intentionally trying to deceive the scientific community. Put differently, implicitly these papers suggest that the starting point should be mistrust in our colleagues unless they can unequivocally demonstrate that they are trustworthy. The publication bias, prejudice against the null hypothesis, cherry picking are by all means poor and undesired habits that hinder science and we should do our best to get rid of them. However, most of these questionable practices are not done necessarily with a conscious intention of deception. It is more the case that many researchers are not aware of the consequences of these malpractices. This is fundamentally different from what researchers like Stapel and Smeesters have done. I know I am touching on a sensitive issue, but I still want to believe that most of my colleagues have some fundamental integrity. If our starting point will be that the other is untrustworthy unless he/she can prove that they are trustworthy we will not get far. As noted by Elster, trust (even if not always justified) is the necessary lubrication for the working of a community (including the scientific one). … authors … know what cost benefit analysis is all about. If I try to assess all the cumber-some procedures and precautions that are proposed in order to discourage the use of unwarranted research practices, I wonder sometimes whether we may not be paying a too high price.
Thanks for listening to my talk … … and sorry for my being more concerned with validity than with reliability … or social desirability
Z Mediator: X Y (a) Z’ Z Spurious Mediator: (b) X Y X Y Correlate: (c) Z X Y Manipulation Check: (d) Z
z = 1.96 (significant) rzy = rxz = 0.3 rxz = 0.5 rxz = 0.7
z = 1.96 (significant) rxy = rxz = 0.3 rxz = 0.5 rxz = 0.7