1 / 30

Writing with Data: Incorporating Statistics Into Causal Research

Writing with Data: Incorporating Statistics Into Causal Research. Statlab Workshop Spring 2011 Brian Fried and Kevin Callender. Outline of Workshop. Part I: Causation and Statistics What is Causation? Correlation? Why Statistics? Threats to Inference Part II: Gathering and Using Data

doris
Download Presentation

Writing with Data: Incorporating Statistics Into Causal Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Writing with Data:Incorporating Statistics Into Causal Research Statlab Workshop Spring 2011 Brian Friedand Kevin Callender

  2. Outline of Workshop Part I: Causation and Statistics • What is Causation? Correlation? • Why Statistics? • Threats to Inference Part II: Gathering and Using Data • Gathering Data • Managing Data Part III: Writing with Statistics • A General Outline, with an example

  3. Causation vs. Correlation …correlation Causation…

  4. Why Statistics Probabilistic Relationships(see previous graph) Multivariate RelationshipsWe can analyze the relationships between multiple variables at the same time.(e.g. education, age, gender, income …. -> voting) What is a regression?

  5. Threats to Inference • Endogeneity (vsexogeneity of errors) • Autocorrelation (time series) • Homo/Heteroskedasticity • Internal vs. external validity Probably the most important step in research design; advanced techniques can often compensate.

  6. Part II: Data Think about analyses early! (Ideal vs. Possible) What’s Possible? What’s Convincing? • Experimental Ideal • Practical Data Limitations • Collecting Your Own Data • Using Other Data Some data sources: • Statlab Webpage (http://statlab.stat.yale.edu) • Advisors/Professional Contacts • Yale StatCat (http://ssrs.yale.edu/statcat/) • ICPSR (http://www.icpsr.umich.edu) • Reference Librarian (Julie Linden)

  7. (Quant.) Data Types and Uses Dependent Variable (response, outcome, criterion) Independent Variables (explanatory or predictor variables) Control / Confounding Variables Categorical and Continuous Variables Remember: Types of variables we choose determine the statistics we use Qualitative knowledge always helps!

  8. Once You’ve Found or Collected Your Data Download the data and documentation • StatTransfer (Statlab) Determine data file type • Probably a text file (.txt, .dat, .raw) Converting text & delimited files Choose a statistical software program

  9. Managing your data Back up all Master Data Files Codebook • Merging Data • Adding variables, cases, computing new variables Keep a roadmap • Keep a log of all analyses with what you have done • Save syntax files

  10. Syntax Files What are they? Text-files used to enter commands in bulk Why? You will make mistakes, need to make changes How do I know what to write? Program’s manual provides the underlying command

  11. Part III: Writing Introduction Theory (Lit Review) Data Description Analysis/Results Conclusion

  12. Introduction Question What is the question you want to answer? Why should we care? Hypothesis Succinctly state your claim Context & Summary

  13. Motivation An Illustrative Example: Bolsa Familia Are politics becoming more programmatic in Brazil? Is Bolsa Familia, a conditional cash transfer (CCT) program that benefits a quarter of Brazil’s population, programmatic?

  14. Programa Bolsa Família – key facts An Illustrative Example: Bolsa Familia Conditional cash transfer (CCT) program, launched in October 2003. This was not the first CCT program in Brazil; some existing programs (like Bolsa Escola) were incorporated into Bolsa Familia. Benefits families with per capita income below US$78. 12 million poor families (almost 50 million people) currently receive support in all 5,564 Brazilian municipalities; Size of stipend: between US$13 and US$114, depending on the family’s size and poverty level. Average amount: US$54 per family 2009 Budget: US$ 10.5 billion (0.4% of Brazil’s GDP)

  15. Theory/Lit. Review What does existing theory say? • What do you believe? • Position yourself within theoretical debates. Identify Testable Hypotheses Choose Method Best Suited to Testing Your Hypothesis Do you need statistics after all? • Quantitative v Qualitative research

  16. Research Question An Illustrative Example: Bolsa Familia Do political criteria explain the variation in Bolsa Familia’s coverage across municipalities? Theoretical (Cox and McCubbins 1986, Dixit and Londregan 1996, Lindbeck and Weibell 1987) and empirical (Ames 1987, Levitt and Snyder 1995, Schady 2000, Dahberg and Johansson 2002, Stokes 2004, Kitschelt 2010) reasons to believe that political spending is often targeted, especially given Brazil’s history with clientelism and pork.

  17. How do politicians target? An Illustrative Example: Bolsa Familia “Core” “Swing” Mobilization

  18. Descriptive Statistics Variables Dependent Variable(s) Independent Variable(s) Important Control Variable(s) Graphs Summary Statistics on Key Variables Number, Mean, Minimum, Maximum, Standard Deviation Cross-Tabs

  19. Descriptive Statistics An Illustrative Example: Bolsa Familia

  20. An Illustrative Example: Bolsa Familia Key Variables

  21. Descriptive Statistics An Illustrative Example: Bolsa Familia

  22. So, how do I analyze my data? Correlational design • Correlation allows you to quantify relationships between variables (r, r-squared) • Correlation, partial correlation • Regression allows you predict scores on 1 variable from subjects score on another variable(s) Group differences • t-test & ANOVA • Chi-square for categorical and frequency data Significance v. effect size Simulations

  23. Methods of Analysis(Empirical Strategy) We discussed this in Part I, but one generally devotes a section to explaining how one will identify a causal relationship prior to the results section. Coverage = β0 + β1(political criteria) + βXX + e

  24. Results: Explaining Coverage in 2009 An Illustrative Example: Bolsa Familia

  25. Effect of Standard Deviation Shift of Explanatory Variables on Coverage in 2009

  26. Robustness Identify Threats to Inference! (Do I have any?)

  27. Robustness Check: Relationship between Coverage in 2004 and Prior Elections

  28. Putting Output into a Paper Cut and Paste Graphs Cut and Paste into Word Processing document Save as .jpeg or .tif file Tables Cut and Paste Format in Word Processing document Import into Excel, format, and then place in Word

  29. More Advanced Analysis Multivariate techniques are only a start; they do help to account for confounding factors, allow for testing change over time and more complex hypotheses… (See: Tabachnick & Fidell, Using Multivariate Statistics) • Be honest about your abilities. • Ask for help • Best off including techniques that you fully understand, but may be worth learning something new!

  30. Take Away Messages • Begin by thinking about what question interests. • Look for data and consider appropriate methods; identify what hypotheses are actually testable. • Design and run analysis; keep a codebook/syntax files! • Back up data • Ask for help-especially when choosing method—and seek feedback on research design. • Research and Writing an Iterative Process

More Related