200 likes | 347 Views
MPP Stats Bootcamp W (12:00-1:50), Weeks 6-9. Instructor: Dr. Alison Johnston ( Alison.Johnston@oregonstate.edu ) http:// oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser ( hauserd@onid.orst.edu ). Bootcamp Outline.
E N D
MPP Stats BootcampW (12:00-1:50), Weeks 6-9 Instructor: Dr. Alison Johnston (Alison.Johnston@oregonstate.edu) http://oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser (hauserd@onid.orst.edu)
Bootcamp Outline • Week 6: An introduction to quantitative research • Dependent and independent variables • Units of Analysis • Samples and Populations • Week 7: Descriptive Statistics • Means, medians, standard deviations, and standard errors • Cross tabs/Contingency tables • Week 8: Significance testing • T-statistics • Means and difference-in-means testing • Confidence intervals • Week 9: Data Management
Why Should we Care About Quantitative Research? • Research design and policy analysis revolves around explaining the influence of an exogenous independent variable on a dependent variable • Causality vs correlation • Statistical inference (quantitative research) is one of two ways a researcher can assess how an independent variable influences a dependent variable • Requires a large sample (generally n>50/100) • Requires that variables can be quantified or codified
Quantitative Research in Research Design • The core building blocks of a research paper include: • An introduction • Explains why your topic is relevant and why your analysis is innovative • Literature review • Explains what has, and more importantly, what has NOT been said about your topic/dependent variable • Theoretical section • Outlines a hypothesis which predicts how your independent variable of interest influences the dependent variable • Helps to distinguish casual mechanism • Methodological section • BRING IN THE QUANTS! • Qualitative is good too…. • Conclusion
what you need to conduct a quantitative Research Design • A dependent variable (an outcome which is to be explained) • Independent variable(s) • IVs of interest: Independent variables which your hypothesis is based upon • Controls: Independent variables which influence your dependent variable, but which you are less interested in • A unit of analysis (the entity which you are examining the relationship between your IV and DV) • Individuals, households, cities, states, counties, countries, etc.
Quantitative Analysis: Dependent and Independent Variables • There are three manners in which we can measure dependent/independent variables • Interval scale: Data is a “real” number with a meaningful quantitative value • i.e. Income, weight, height, etc. • Continuous and discrete (counts) • Nominal scale: Data is assigned numerical value that has no quantitative meaning NOR no natural order • i.e. Coding of political party, gender, ethnicity, etc. • Ordinal scale: Data is assigned numerical value that has no quantitative means BUT it has a natural order • i.e. Coding of attitudes/satisfaction on Likert scales
Quantitative Research methods that you will be exposed to in 524 and 516
Quantitative Research Methods that you can learn in electives
Units of Analysis and Data Collection: Trying to Avoid Bias At all Costs • To examine quantitative relationships between an independent and dependent variable, we need data • In an ideal world, we want to collect all data within a population, the entire group that we are interested • In the real world, we may be limited in collecting data for an entire population and may have to rely upon a random sample • Problems with bias
Data Collection: Populations or Samples? • Whether you are collecting data for an entire population or for a sample (representative sub-set of the population) depends upon your unit of analysis • Countries? • States? • Towns/Cities? • Individuals/Households? • For aggregate units of analysis (i.e. countries, US states) it may be possible to capture the entire population, although we can not be certain that it is correctly measured
Data Collection: Populations • (Some) organizations with data on aggregate units of analysis • Countries: • United Nations • World Bank • International Monetary Fund • Organization for Economic Cooperation and Development (developed countries) • US States: • US Census • Various US Government Departments • National Center for Education Statistics
Data Collection: Samples • For micro-level units of analysis (i.e. towns/cities, individuals/households) it may be impractical or impossible to collect data for the entire population • Accurate statistical inference relies upon the selection of a random sample which is representative of the population • Want to avoid three sample biases which may skew our results and cause us to make incorrect conclusions • Selection bias • Survivor bias • Nonresponsive bias
Data Collection: Samples • If your unit of analysis is a micro-measure, there are two methods of collecting a sample • Use existing sample data (but beware of bias!) • US Census: Current Population Survey • World Values Survey • Surveys from NGOs/GOs/NPOs • Create your own sample via a survey • Be ESPECIALLY beware of bias
Samples and Populations: Statistical Properties • We use samples (estimators) to estimate characteristics of a population (parameters) whose value is unknown • We can not say whether a sample estimator (i.e. mean, standard error, beta coefficient, etc.) is equal to the population parameter • BUT we can use probability theory to determine how likely it is that our sample will produce estimators close to our population’s parameters • Significance testing tells us how confident we can be that our population parameters are NOT equal to a certain value
Probability Theory and Sample Distributions • A sampling distribution of a statistic, such as a mean, is the probability distribution that describes all possible values of this statistic • Think of it as a data road map (histogram)
Sample Distributions: The Convenience of Normalcy • Normal distributions are a central component of quantitative analysis • Provides a symmetric road map of sample data in relation to a the mean • The 68%, 95% and 99.7% rules… • Necessary assumption for t-statistic in significance testing and beta coefficient distributions for OLS!
A Mean who just wants to be Normal: The Central Limit Theorem • We prefer our sample and its estimators (i.e. sample means) be normally distributed given the centrality of this type of distribution to empirical analysis • If it is not, never fear, the Central Limit Theorem is here (at least for means)! • Central Limit Theorem: Given a sample of independent and identically distributed random variables with a finite, non-zero standard deviation, the distribution of the mean approaches normal as the number of samples drawn increases
STATA LAB EXERCISES • How to open and upload data into STATA • How to create a log of one’s work • How to codify and generate variables • How to create a histogram which enables one to assess whether one’s data is normally distributed • Conceptualizing the Central Limit Theorem via random draws (requires no data)