490 likes | 667 Views
Analyzing Experimental Data. Shouldn’t Design Be Enough?. use of manipulation, control, & randomization should be enough – data speak for itself? Problem – theories no longer simple a causes b hypothesis testing. Complexity of theory means optimal design can be computationally difficult.
E N D
Shouldn’t Design Be Enough? • use of manipulation, control, & randomization should be enough – data speak for itself? • Problem – theories no longer simple a causes b hypothesis testing. • Complexity of theory means optimal design can be computationally difficult. • Material presented here on optimal design is mainly about how to design an experiment to test a “complete” theory – one that can be expressed in a maximum likelihood function for data observed.
On Optimal Experiment Size • Issues ignored in analysis – usually data generated from experiments can have a use beyond experiment & for other currently unimagined uses • There is a fixed cost to running an experiment that should be considered, analysis is only on marginal costs.
Sources for Material on Optimal Design • El-Gamal, Mahmoud A., Richard D. McKelvey, & Thomas R. Palfrey, “Computational Issues in the Statistical Design and Analysis of Experimental Games,” International Journal of Supercomputer Applications, vol. 7, no. 3, fall 1993, 189-200. • El-Gamal, Mahmoud A. & Thomas R. Palfrey, “Economical Experiments: Bayesian Efficient Experimental Design,” International Journal of Game Theory, 1996, 25:495-517.
Standard FTT Design • Researcher decides ex ante (w/o any statistical analysis) an interesting experiment to run. • Then attempts collect as much data as financially feasible, non-statistically considering • importance of experiment as piece of research agenda, • how high payoffs need to be to make subjects interested in making optimal decisions, • length of time one can keep subjects in laboratory, etc.
Standard FTT Design • After data collected, researcher approaches same way as non-experimental data – hypothesizing form of data generating process (either structural of reduced form), & proceeding with estimation & hypothesis testing. • However, as noted earlier, “point” predictions in most FTT fail, because most game theoretic models suffer from zero likelihood problem. • Possible observe data our model predicts could never happen (e.g. subjects choosing strictly dominated strategies).
Zero Likelihood Problem • Most game theoretic experiments ignore this issue & concentrate on relationship predictions, using standard hypothesis testing techniques. • But ideally, better to devise theoretical model which allows for statistical estimation of a likelihood function – true structural estimation. • Zero likelihood problem means a failure of theory, theory should be corrected. • For decision-theoretic situation not that difficult, just add some random error to individual’s behavior (what implicitly do in regression equations explaining individual behavior)
Zero Likelihood Problem • Most game theoretic experiments ignore this issue & concentrate on relationship predictions, using standard hypothesis testing techniques. • But ideally, better to devise theoretical model which allows for statistical estimation of a likelihood function – true structural estimation.
Zero Likelihood Problem • Adds huge computational complexity to solving games. • One solution is Quantal Response Equilibrium (QRE) concept devised by McKelvey & Palfrey. • McKelvey, R.D. and Palfrey, T.R. (1995). “Quantal Response Equilibria in Normal Form Games.” Games and Economic Behavior. 7, 6–38.
Problems w/ Standard FTT Design • While QRE or something similar (see other references in Methods & Models) can solve zero likelihood problem, still disconnect between theoretical purpose of experiments & standard design of FTT experiments. • Experiments are costly. • Some experimental designs may discriminate between models so poorly as to render them useless. • Why not attempt to get most bang (statistical result) for bucks . . .
Optimal Experimental Design: Step 1 • 2 game theoretic models, both solve zero likelihood problem & applied using maximum likelihood methods. • Class of experiments parametized by some vector . • Typically correspond to payoff structures. • Can, from likelihood functions compute the Kullback-Liebler information number which measures how informative given design is expected to be if model were correct.
Kullback-Liebler Information # • Let X be space of all possible data sets under all designs proposed. • Denote a typical data set by x. • Let the likelihoods of a given data set x X under design for each of n competing models be l1(x;), l2(x;), . . . ln(x;).
Kullback-Liebler Information # • Given a collection of priors on models 1, 2, . . . , n, say p1, p2, . . . Pn, we can define for each model the Kullback-Liebler information number measuring how much information a given design is expected to be if that model were correct.
Kullback-Liebler Information # • Information number of model 1 under design is:
Kullback-Liebler Information # • Design that maximizes expected separation between model 1 & other n – 1 models, if model 1 were indeed correct model, is
Kullback-Liebler Information # • If want to maximize overall informativeness of design, weight information numbers by prior on each of the models & choose
Optimal Experimental Design: Step 1 • El-Gamal, McKelvey, & Palfrey give algorithm to do this maximization problem in “Computational Issues . . .” • Required Cray computer in early 1990s . . . • Give example in paper – pretty complex – would be nice to be able to translate this to more political science context – but it may be that computationally this is too difficult? • Most FTT & PTT still use ad hoc approach on payoffs (including Cal Tech guys).
Optimal Experimental Design: Step 2 • Still problem – how many observations need? • Need stopping rule . . . • Assume chosen & have two rival models. • EMP propose stopping rule belongs to family of Wald’s sequential probability ratio tests (SPRTs)
Optimal Experimental Design: Step 2 • Assume loss of selecting correct model 0 & cost of selecting wrong model K. • SPRT has property of minimizing expected sample size (& hence expected cost of set of experiments) in class of all tests with same type I & type II error probabilities.
Optimal Experimental Design: Step 2 • SPRTs take following form: • Continue sampling until likelihood ratio between two models crosses one of two boundaries. • If upper boundary is crossed, accept model whose likelihood appears in numerator of likelihood ratio, & if lower boundary is crossed, accept other model.
Optimal Experimental Design: Step 2 • A number of approximations to compute optimal stopping boundaries have been proposed. • Most popular involve using Wald’s approximation of type I & type II error & expected stopping time.
Optimal Experimental Design: Step 2 • Berger suggests a further approximation based on cost per experiment c being much smaller than loss of selecting wrong model K. • Resulting rule sets boundaries of A & B. • Let be prior on model 1 being correct & I(I) information number for model i as calculated above.
Optimal Experimental Design: Step 2 • Stop & accept model 1 if likelihood ratio of model 1 to model 2 is greater than B, stop & accept model 2 if likelihood ratio is less than A, & continue sampling otherwise.
Optimal Experimental Design: Step 2 • Again, rarely do experimentalists do any of this. • Why?
Are Ethical Issues in Experiments Different? • Academics “mess” with real world all the time • As teachers – can influence students view of world – ethics of teaching rational choice • As researchers in publishing work – newspapers pick up conclusions, general public believe them . . . • As policy advisors – policy makers actually solicit advice under assumption more knowledgeable, principal agent issues.
How Are Experiments Different? • Transparency of motives • Even if not being deceptive, usually not “honest” about motives • When deception is involved, potentially more serious.
How Are Experiments Different? • Effect of experiment? • In experiments in lab with subjects “paid” can argue “less” of an effect than those where influenced for other reasons (I.e. please teacher, see advisor as having more information) • However, not so true in field experiments – informed consent issues
How Are Experiments Different? • Inducing of real decisions • In lab or field, experimenter induces subjects to make real decisions may not make if experimenter did not intervene • Students can tell teacher what s/he wants to hear, but don’t necessarily have to alter real world behavior. • Same for readers of journal articles or policy makers who solicit advice.
How Are Experiments Different? • Inducing of real decisions • However, not option for subject in experiment. • Moreover, in making these decisions subjects may learn something about themselves not like (defect in PD, unacknowledged racist preferences)
Should We Experiment? • Consider Cost/benefit approach • Theoretically if benefits > costs, worth doing.
Problems with Benefits • Benefits theoretical – posit experiments will reveal important information based on priors, but could be wrong, even if design is “optimal” • Benefits in politics “debatable” – I.e. do not have consensus about many of things study – debate over normative implications of political processes means perceived benefits depend on normative preferences of researcher.
Problems with Benefits • Politics involves real political outcomes & extent experiments involve real world of politics, benefits may mean winners & losers in real world.
Problems with Cost/Benefit Approach • Costs also difficult to measure or perceive – as discussed above. • Alternative approach – • Decide a priori some costs not worth bearing regardless of benefit • Provide transparency as much as possible both in lab & field • Acknowledge & incorporate potential disagreement over theoretical benefits
The FutureEITM: Experimental Implications of Theoretical Models What Should be the Relationship between Experimental and Theoretical Work in Political Science?
Experimental Work within Political Science, The Present • Some small # of researchers do game theory testing (FTT for formal theory testing) • Larger # do political psychology (usually trained like/as social psychologists) (PTT for psychological theory testing) • Newer atheoretical experimental work (ATT for anti theory testing)
Experimental Work within Political Science, The Present • Very small percentage of overall empirical research in political science • Modal political science department does not have an experimentalist & if they have one it is a PTT type
Experimental Work within Political Science, The Present • In fact, FTT work is stagnant (maybe declining within discipline?) • While PTT is focusing on ways to increase external validity (moving the lab to the mall, making experimental situation in lab more realistic, CATI survey experiments) • & ATT work is expanding
Experimental Political Science outside the Discipline, The Present • # of experimental economists look at political science related questions (FTT) • Some psychologists (usually more “communications” scholars) do political science questions (PTT) • Occasionally gets in political science journals but not generally read or cited within political science
In Contrast, within Experimental Economics • While are few with psychology backgrounds, take it as given that common economics model & role of game theory & PTT in economics looks much closer to FTT, more communication between groups (PTT in minority) • Modal economics department still doesn’t have an experimental economist – but top departments are hiring them & tend to be FTT types • No ATT that I know of . . .
Why the Difference?Theory 1 – Political Science is Economics Delayed • Evidence in favor: • FT work in PS expanding to new problems, becoming important in areas not “colonized” like the presidency, comparative, etc. • More FT training in graduate programs
Why the Difference?Theory 1 – Political Science is Economics Delayed • But several observations against: • Demand, renewed, that FT work be FTT work – general & pervasive unwillingness to accept FT without additional T • FTT experimental work has not increased (maybe declined) & more externally valid PTT & ATT are increasing (not high rates, but not declining as FTT)
Theory 2 – Political Science has its own Path • Traditionally, political scientists have emphasized context of data studied unlike psychologists or many economists • Most models are different (more institutional detail) • Much more emphasis on empirical relevance of models • Emphasis on non-experimental data over experimental data because of “external validity”
Theory 2 – Political Science has its own Path – Implications • No perceived need for political science as a discipline to replace game theoretical economists or psychologists, both of whom study human behavior in general senses • In fact, some FT work by political scientists seeks publication in economics journals & economists to work with (similar behavior among Political Psychologists) • Experimental work that has more perceived external validity is always viewed as more useful to political scientists
Problems • Knowledge trapped – few political scientists know or understand developments in FTT & PTT outside of discipline. • For example, even though grad students may have FT training, rarely refers to FTT experimental work. • Methods classes may present some experimental work, but typically PTT & by political scientists • Persistent misunderstanding of the flexibility of FTT experimental work to address external validity issues.
FTT & External Validity Issues • False assumption of many political scientists • Only way to deal with real political world is to use either non-experimental data, field experimental data or highly realistic PTT generated data. • But when working with FTT can add, in a controlled fashion, external validity. • Means that FTT experimental work in political science needs to be different from that in economics.
Proposed Game Plan for FTT Experimental Political Science • Take an existing popular formal theory in political science – Baron/Ferejohn Legislative Bargaining game • Traditional experimental economics approach – strip to bare bones & test basic behavior predicted (combination of ultimatium & dictator games) • Still need this type work • But need to take this & build in institutional & subject pool detail (on gradual basis) so can see where disconnects between theory & empirical world matter
Proposed Game Plan for FTT Experimental Political Science • Requires a team approach over an individualistic or even small group approach • Interactions with PTT – incorporating some of methods of PTT that increase external validity to testing explicit FT