370 likes | 388 Views
Methodological Workshop 1: Research Design. Yu Xie University of Michigan. Otis Dudley Duncan. “ But sociology is not like physics. Nothing but physics is like physics, because any understanding of the world that is like the physicist’s understanding becomes part of physics…”
E N D
Methodological Workshop 1:Research Design Yu XieUniversity of Michigan
Otis Dudley Duncan • “But sociology is not like physics. Nothing but physics is like physics, because any understanding of the world that is like the physicist’s understanding becomes part of physics…” • (Otis Dudley Duncan. 1984. Notes on Social Measurement. p.169)
First Principle of Social Science • Variability is the very essence of social science research. • “Variability Principle.” • We are interested in understanding how social outcomes vary across members in a human population and over time. • Mortality example.
Second Principle • Social grouping reduces such variability. • “Social Grouping Principle.” • We seek to understand patterns of “between-group” variations in social outcomes. • Mortality example.
Third Principle • Patterns of population variability vary with social context, which is often defined by time and space. • “Social Context Principle” • Patterns of between-group variations vary by social context. • Mortality example: is the education-mortality relationship reduced or eliminated through social policy?
Different “Regimes” of Variability • Social contexts are different from social groups in that the former are self-contained social systems with natural boundaries, for example by time and space. • Patterns of individual variability may be governed by “relationships” between individuals that are not reducible to individuals’ attributes. • Patterns of individual variability may be governed by macro-level conditions such as “social structure,” “political structure,” or “culture,” which may be discontinuous and fixed. • Collective action may lead to changes of macro-level conditions and human relationships –major sources of social change.
Population Thinking and Statistics • In typological thinking, deviations from the mean are nothing but “errors,” with the mean approaching the true cause. (Example: measurement of the speed of sound.) • In populationthinking, deviations are the reality of substantive importance; the mean is a property of a population.
Two Views of Regression • Gaussian View (Typological Thinking): • Observed Data = Constant Model + Measurement Error • Example: yi = m + ei, where m is a true constant. • Galtonian View (Population Thinking): • Observed Data = Systematic (between-group) Variability + Remaining (within-group) Variability • Example: yi = m + ei,where m=exp(Y).
Potential Biases in Regression Analysis • Yi = a + diDi + ei There are two types of variability that may cause biases: • (1) Pre-treatment heterogeneity bias : ei. If corr(e,,D)≠0, => pre-treatment heterogeneity bias. • (2) Treatment-effect heterogeneity bias : di If corr(d,,D)≠0, => treatment-effect heterogeneity bias.
Comment • When the first form of heterogeneity bias is present, we may have “spurious” causal effect. • “Omitted variable bias” • “Correlation does not equal causation.” • Example D Y U e
Comment • Second form of heterogeneity bias may result from rational “anticipatory behavior.” • Problem of “self-selection.” • Example D Y U e
Yu Xie’s “Fundamental Paradox in Social Science” • There is always variability at the individual level. • Causal inference is impossible at the individual level and thus always requires statistical analysis at the group level on the basis of some homogeneity assumption.
Key Difficulties of a Research Design • (1) How do we know that results based on your “comparison” are valid? • “Internal validity” • (2) How do we know that results based on your “comparison” hold true in other settings? • “External validity”
Research Design Possibilities • Social Experiments (Randomization) • Structural Approach • Multivariate Analysis (Social Grouping Principle) • Multi-level Analysis (Social Context Principle) • “Quasi-Experimental Designs” or “Natural Experiments”. • Instrumental Variables (Randomization) • Regression Discontinuity (Social Context Principle) • Utilizing Spatial Variation (Social Context Principle) • Utilizing Temporal Variation (Social Context Principle) • Clustering Design • Fixed Effects Model (Social Grouping Principle)
Three Key Features of a Good Paper • The harmonious trio: Theory, Design, and Evidence. All need to be in place. • A good theoretical/conceptual framework –> research question. • A good research design -> matching empirical data to research question). • Good data analysis -> results that address the research question. • Tight integration of the three.
Why Focus on Small Topics? • Socratic method of inquiry in the western tradition. • True knowledge can stand harsh criticisms. • Many important, big questions are not researchable questions, such as value of life. • From small to big, accumulation of knowledge. • “Demographic tradition” under Duncan’s influence.
Experimental Approach • Experimental design eliminates both forms of the heterogeneity biases. • Example: High/Scope Perry Preschool study conducted in Ypsilanti. • Manski and Garfinkel (1992): experimental designs suffer from shortcomings that are often overlooked. • Manski and Garfinkel refer to experimental approach as “reduced-form.”
Shortcomings of Experimental Approach • We cannot always extrapolate results from an experimental setting to natural setting. • Thus, Manski and Garfinkel openly criticize experimental designs:"In fact, reduced-form experimental evaluation actually requires that a highly specific and suspect structural assumption hold: Individuals and organizations must respond in the same way to the experimental version of a program as they would to the actual version." (p.17) • I.e., lacking “external validity.”
Structural Approach • Manski and Garfinkel propose the "structural" approach as an alternative. • Definition: structural approach refers to statistical methods that model causal processes based on observational data. • Head Start example: control on SES, parental involvement, etc. • Requires strong social science theories.
Comparison of the two Approaches Advantages of Structural Approach: • Since it is conducted in a natural setting, its findings are directly relevant to the whole population. In contrast, results from an experimental design need to be extrapolated. • It is less costly. In contrast, experimental research is very expensive. • It builds upon and contributes to theory. In contract, the reduced-form approach only yield simple answers to simple questions.
Advantages of Reduced-form Approach • Biases due to unobservables can be eliminated through randomization. • It requires fewer assumptions. • It does not require complicated statistical models that the public and government officials have difficulty understanding.
Beyond the Variability Principle • Use of social grouping principle allows us to better understand group-specific properties, i.e., between-group analyses. • Useful as a descriptive tool. No assumption is needed. • Application of Galtonian regression: • Regression = E(Y|X), X denotes group
Using Social Grouping to Control for Heterogeneity • Social grouping always reduces variability => implies within-group homogeneity. • We may assume that meaningful heterogeneity and endogeneity can be captured by social grouping (still wishful thinking). • Assumptions (comment 5) are more plausible after social grouping than before.
Multiple Regression • Change regression to: • Yi = a + dDi +b’Xi + ei • Interpretation of d: • Treatment effect within levels of X, or controlling for X. D Y X e
Comment • For X to do this, it needs to be correlated with D (“correlation condition,” c1) and affects Y (“relevance condition,” c2). • X should be pre-treatment, determining both D and Y structurally. D Y c1 c2 X e
Examples: Quasi-Experiment Design Utilizing Spatial Variation • Certain policies are introduced in State A but not in State B. • States A and B are otherwise comparable. • Observe how outcome Y differs between State A and State B. • Pace of economic reforms in China differs greatly by region • Associate regional variation in returns to education to regional variation in depth of economic reforms.
Examples: Quasi-Experiment Design Utilizing Temporal Variation • Declining significance of race? • Examine temporal changes in SES differences by race • Hope to see a narrowing of racial gaps, particularly after the civil rights movement. • Effect of a new instructional method:
INSTRUMENTAL VARIABLES • WHAT ARE INSTRUMENTS? • Intuitively, instruments are variables that move around the probability of participation but do not affect outcomes other than through their effect on participation. • Put more statistically, instruments are variables that are correlated with the endogenous variable – in this context the treatment indicator – but not correlated with the unobservable in the outcome equation.
Instrumental-Variable Approach • Condition: IVZ affects Y only through X, meaning: • Z is correlated with Y but does not affect Y directly (called “exclusion restriction”). • Z is also correlated with X but not perfectly. • It’s very hard to find a good Z. Y X Z U
WHERE DO INSTRUMENTS COME FROM? • Theory combined with clever data collection • Ex: Lottery number of military enlistment (Angrist 1990) • Ex: distance as in Card (1995)
COMMON EFFECT IV EXAMPLE I • A training center serves two towns: the near town and the far town. • The impact of training on those who take it is 10, while the outcome in the absence of training is 100. • For those in the near town, the cost is zero for everyone. In the far town, for those with a car the cost is essentially zero; for those without one the cost is 10. • Assume that a random half of the eligible persons have a car and that there are 200 eligible persons in each town. • Assume also that everyone knows their cost of training and their benefits from training, and participates only when the benefits exceed the costs.
COMMON EFFECT IV EXAMPLE II • Let Z =1 denote residence in the near town and Z = 0 denote residence in the far town. • Using our standard notation: • Pr(D=1|Z=1)=1 • Pr(D=1|Z=0)=0.5 • Pr(Y=1|Z=1)=YC + d Pr(D=1|Z=1) =100+10*1.0 = 110 • Pr(Y=1|Z=0)=YC + d Pr(D=1|Z=0) =100+10*0.5 = 105
COMMON EFFECT IV EXAMPLE – III • The IV estimator in this simple case is given by: • Inserting the numbers from the example into the formula gives:
A CONTINUOUS INSTRUMENT IN A COMMON EFFECT WORLD • The two-stage least squares estimator is commonly used in this case. • In the first stage, the endogenous variable (i.e., the treatment indicator) is regressed on all the exogenous variables, including the instrument. • The second-stage outcome equation regression then includes the predicted value of the endogenous variable rather than the endogenous variable itself. • Standard errors must be corrected to account for the first-stage estimation. Most software packages now do this for you. (ivreg command in Stata.)
A Complication: When Treatment Effects are Heterogeneous • IV Estimator is turned to Local Average Treatment Effect (LATE): average treatment effect for those persons whose treatment status is affected by random assignment. • Also called “principal stratification approach.” (Angrist, Imbens, and Rubin. 1996; Little, and Yau 1998)
Classification of Compliance Status T Treatment received 0 1 Compliers Never-takers Defiers Always-takers 0 R Assignment Compliers Always-takers Defiers Never-takers 1 0 = control 1 = treatment
References • Angrist, Joshua. 1990. “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records” American Economic Review, 80: 313-36. • Angrist, J. D., G.W. Imbens, and D.B. Rubin. 1996. “Identification of Causal Effects Using Instrumental Variables.” Journal of the American Statistical Association 91(434): 444-455. • Card, David. 1995. “Using Geographic Variation in College Proximity to Estimate the Return to Schooling.” Pp. 201-222 in Aspects of Labour Market Behavior: Essays in Honour of John Vanderkamp, ed. by Louis Christofides, E. Kenneth Grant, and Robert Swidinsky. Toronto: University of Toronto Press. • Little, Roderick J. & Yau, Linda H.Y. 1998. “Statistical Techniques for Analyzing Data from Prevention Trials: Treatment of No-shows Using Rubin's Causal Model.” Psychological Methods 3(2):147-159. • Manski, C.F., and Garfinkel, I. 1992. “Introduction.” Pp.1-21 in Evaluating Welfare and Training Programs, edited by Manski, Charles F. and Irwin Garfinkel. Cambridge, MA: Harvard University Press.