560 likes | 675 Views
Experimental Methods in Social Ecological Systems. Juan-Camilo Cárdenas Universidad de los Andes Jim Murphy University of Alaska Anchorage. Agenda – Day 1. Noon –12:15 Welcome, introductions 12:15 – 1:15 Play Game #1 (CPR: 1 species vs. 4 species)
E N D
Experimental Methods in Social Ecological Systems Juan-Camilo Cárdenas Universidad de los Andes Jim Murphy University of Alaska Anchorage
Agenda – Day 1 • Noon–12:15 Welcome, introductions • 12:15 – 1:15 Play Game #1 (CPR: 1 species vs. 4 species) • 1:15 – 2:00 Debrief game #1 and other results from the field • 2:00 – 2:15 Break • 2:15 – 3:15 Game #2 (Beans game) • 3:15 – 4:00 Debrief Game #2 • 4:00 – 4:15 Break • 4:15 – 5:00 Basics of Experimental design • Homework for Day 2: Think of an interesting question or problem to be worked in groups tomorrow
Agenda – Day 2 • 8:30 – 9:15 Designing and running experiments in the field • 9:15 – 10:15 Classwork: work in groups solving experimental design problems • 10:15 – 10:30 Break • 10:30 – 11:15 Discussion on group solutions • 11:15 – noon Begin design your own experiment (form groups based on best ideas proposed) • Noon – 1:00 Lunch • 1:00 – 1:30 Continue design your own experiment (work in groups) • 1:30 – 2:30 Present designs • 2:30 – 3:00 Feedback: how could we make this workshop better?
Materials online • We will create a web site with materials from the workshop. • Please give us your email address (write neatly!!) and we will send you a link when it is ready.
Types of experiments 1. “Speaking to Theorists” • Test a theory or discriminate between theories • Compare theoretical predictions with experimental observations • Does non-cooperative game theory accurately predict aggregate behavior in an unregulated CPR? • Explore the causes of a theory’s failure • If what you observe in the lab differs from theory, try to figure out why. • Communication increases cooperation in a CPR even though it is “cheap talk” • Why? • Is my experiment designed correctly? • What caused the failure? • Theory stress tests (boundary experiments)
Types of experiments (cont.) 2. “Searching for Facts” • Establish empirical regularities as a basis for new theory • In most sciences, new theories are often preceded by much observation. • “I keep noticing this. What’s going on here?” • The Double Auction • Years of experimental data showed its efficiency even though no formal models had been developed to explain why this was the case. • Behavioral Economics • Many experiments identifying anomalies, but have not yet developed a theory to explain.
Types of experiments (cont.) 3. “Whispering in the Ears of Princes” • Evaluate policy proposals • Alternative institutions for auctioning emissions permits • Allocating space shuttle resources • Test bed for new institutions • Electric power markets • Water markets • Pollution permits • FCC spectrum licenses
Baseline “static” CPR game • Common pool resource experiment • Social dilemma • Individual vs group interests • Benefits to cooperation • Incentives to not cooperate • Field experiments in rural Colombia • Groups of 5 people • Decide how much to extract/harvest from a shared natural resource
Subjects choose a level of extraction 0 – 8 Low harvest levels (“conservative”) High harvest levels
Social optimum:All choose 1 Nash equilibrium:All choose 6
Comment on payoff tables • The early CPR experiments typically used payoff tables. • We don’t live in a world of payoff tables • Frames how a person should think about the game • A lot of numbers, hard to read • Too abstract?? • More recent CPR experiments using richer ecological contexts • e.g., managing a fishery is different than an irrigation system
Objective • To explore interaction between: • Formal regulations imposed on a community to conserve local natural resources • Informal non-binding verbal agreements to do the same.
Possible 2x3 factorial design • Groups of N=5 participants • Play 10 rounds of one of the 6 treatments • Enforcement • Individual harvest quota = 1 (Social optimum) • Exogenous probability of audit • Fine (per unit violation) if caught exceeding quota • Participants paid based on cumulative earnings in all 10 rounds These 2 treatments have been conducted ad nauseum. Are they necessary?
Baselines and replication • Replication • In any experimental science, it is important for key results to be replicated to test robustness • Link to previous research. Is your sample unique? • Baseline or control group • The baseline treatment also gives us a basis for evaluating what the effects are of each treatment • In any experimental study, it is crucial to think carefully about the relevant control!
Alternative design • Stage 1 – Baseline CPR (5 rounds) • Stage 2 – one of the 5 remaining treatments (5 rounds) • Comm only • Low • Low + Comm • Med • Med + Comm • Advantage – Having all groups play Stage 1 baseline facilitates a clean comparison across groups. • Disadvantage – fewer rounds of the Stage 2 treatments. Enough time to converge?? • Disadvantage(?) – All stage 2 decisions conditioned upon having already played a baseline
Optimal sample size • Groups of N=5 participants • How many groups per treatment cell?
John List’s notes on sample size Also see:John A. List · Sally Sadoff · Mathis Wagner “So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design” Experimental Economics (2011). 14:439-457
Some Design Insights A. 0 (control) / 1 (treatment), equal outcome variances B. 0/1 treatment, unequal outcome variances C. Treatment Intensity—no longer binary D. Clusters
Some Design Rules of Thumb for Differences in between-subject experiments Assume that X0is N(μ0,σ02) and X1 is N(μ1, σ12); and the minimum detectable effect μ1– μ0= δ. H0: μ0= μ1and H1: μ1– μ0= δ. We need the difference in sample means X1 –X0to satisfy: 1. Significance level (probability of Type I error) = α: 2. Power (1 – probability of Type II error) = 1-β:
Power A. Our usual approach stems from the standard regression model: under a true null what is the probability of observing the coefficient that we observed? B. Power calculations are quite different, exploring if the alternative hypothesis is true, then what is the probability that the estimated coefficient lies outside the 95% CI defined under the null.
Sample Sizes for Differences in Means (Equal Variances) • Solving equations 1 and 2 assuming equal variances σ12 = σ22: • Note that the necessary sample size • Increases rapidly with the desired significance level (ta/2) and power (tb). • Increases proportionally with the variance of outcomes (s). • Decreases inversely proportionally with the square of the minimum detectable effect size (d). • Sample size depends on the ratio of effect size to standard deviation. Hence, effect sizes can just as easily be expressed in standard deviations.
Standard is to use α=0.05 and have power of 0.80 (β=0.20). • So if we want to detect a one-standard deviation change using the standard approach, we would need: • n = 2(1.96 + 0.84)2*(1)2 = 15.68 observations in each cell • ½ std. dev. change is detectable with 4*15.68 ~ 64 observations per cell • n=30 seems to be the magic number in many experimental studies: ~ 0.70 std. dev. change.
Sample Size “Rules of Thumb”: • Assuming α =0.05 and β = 0.20 requires n subjects: • α = 0.05 and β = 0.05 1.65 × n • α = 0.01 and β = 0.20 1.49 × n • α = 0.01 and β = 0.05 2.27 × n
Example from a recent undergrad research project • Local homeless shelter was conducting a fundraising campaign. • They asked us to replicate List’s study about the effects of matching contributions. • The shelter wanted the same 4 treatments as in List: • No match, 1:1, 2:1, and 3:1 to test whether high match ratios would increase contributions. • Local oil company agreed to donate up to $5000 to provide a match for money donated.
Fundraising example • The shelter had funds to send out 16,000 letters to high income women in Anchorage who had never donated before. • Expected response rate was about 3 to 4% (n»480-640) • Question: How many treatments should we run, if we expect about 500 responses? • They said a “meaningful” treatment effect would be ~$25. • Standard deviation from previous campaigns was ~$100.
Sample size • With only 500 expected responses, we could only conduct 2 treatments.
Sample Sizes for Differences in Means (unequal variances) Another Rule of Thumb—if the outcome variances are not equal then: The ratio of the optimal proportions of the total sample in control and treatment groups is equal to the ratio of the standard deviations. Example: Communication tends to reduce the variance, so perhaps groups in this treatment.
Treatment levels • How many levels of enforcement do we need? Do we need 3 levels of enforcement?
What about Treatment Levels? • Assume that you are interested in understanding the intensity of treatment : • Level of enforcement (e.g., audit probability) • Assume that the outcome variance is equal across various cells. • How should you allocate the sample if audit probability could be between 0-1? • For simplicity, say X=25%, 50%, or 75% • Assume that you have 1000 subjects available.
Reconsider what we are doing: Y = XB + e One goal in this case is to derive the most precise estimate of B by using exogenous variation in X. Recall that the standard error of B is = var(e)/n*var(X)
Rules of Thumb Linear Quadratic ¼@X=25% ½@X=50% ¼@X=75% • ½ sample @ X=25% • 0 @X=50% • ½ @ X=75% Intuition: The test for a quadratic effect compares the mean of the outcomes at the extremes to the mean of the outcome at the midpoint
Intra-cluster Correlation • What happens when the level of randomization differs from the unit of observation? Think of randomization at the village level, or at the store level, and outcomes are observed at the individual level. • Classic example: comparing two textbooks. • Randomization over classrooms • Observations at individual level Another Example: • To test robustness of results, you may want to conduct the experiments in multiple communities. • How do you allocate treatments across communities, especially if number of participants per village is small? • In our Colombian enforcement study, we replicated the entire design in three regions. • In a separate CPR experiment in Russia, we visited 3 communities in one region. Each treatment was conducted 1x in each community. • We are assuming that the differences across communities are small. • Cannot make cross-community comparison
Intracluster Correlation • Real Sample Size (RSS) = mk/CE m = number of subjects in a cluster k = number of clusters CE = 1 + ρ(m-1) ρ = intracluster correlation coefficient = s2B/(s2B + s2w) s2B = variance between clusters s2w = variance within clusters
Randomized factorial design • Advantages • Independence among the factor variables • Can explore interactions between factors • Disadvantages • Number of treatments grows quickly with increase in number of factors or levels within a factor • Example: Conduct experiment in multiple communities and use community as a treatment variable
Fractional factorial design • Say we want to add informal sanctions with a 3:1 ratio • I can pay $3 to reduce your earnings by $1 • 1 new “factor” with 2 “levels” • To run all combinations would require 2x2x2 = 8 treatments • Assume optimal sample size per cell is 6 groups of 5 people (30 total per cell) • 8 treatments x 30 people/cell = 240 people • Assume you can only recruit about half that (~120) • You could run only 3 groups per cell (15 people) – lose power/significance • Solution: conduct a balanced subset of treatments
Fractional factorial design • If you are considering this approach, there are a few different design options depending upon the effects you want to capture, number of treatments, etc. • This is just one example! Communication Sanctions ExternalEnforcement
Fractional factorial design • Advantage: dramatically reduces the number of trials • Disadvantage: achieves balance by systematically confounding some direct effects with some interactions. • It may not be serious, but you will lose the ability to analyze all of the different possible interactions.
Nuisance Variables • Other factors of little or no primary interest that can also affect decisions. These nuisance effects could be significant. • Common examples • Gender, age, nationality (most socio-economic vbls) • Selection bias • Recruitment -- open to whoever shows up vs random selection • Experience • Participated in previous experiments • Learning • Concern in multi-round experiments • Non-experiment interactions • People talking before an experiment while waiting to start • In a community, people may hear about experiment from others
Confounded variables • Confounding occurs when the effects of two independent variables are intertwined so that you cannot determine which of the variables is responsible for the observed effect. • Example: • What are some potential confounds when comparing the Baseline with Low?
Another design approach • If trying to identify factors that influence decisions, try adding them one at a time. • Imposing a fine for non-compliance differs from the baseline CPR in multiple ways. Possible confounds: • FRAME • The simple existence of a quota may send a signal about expected behavior, independent of any audits or fines. • GUILT = FRAME + audit • Getting audited may generate feelings of guilt because the individual is privately reminded about anti-social choices • FINE = FRAME + GUILT (audit) + fine for violations • Are people responding to the expected penalty? Or are they responding to the frame from the quota?
3 Sources of variability • conditions of interest (wanted) • measurement error (unwanted) • People can make mistakes, misunderstand instructions, typos • experimental material and process (unwanted) • No two people are identical, and their responses to the same situation may not be the same, even if your theory predicts otherwise.
Design in a nutshell • Isolate the effects of interest • Control what you can • Randomize the rest