250 likes | 375 Views
Data, Models and the Search for Exchangeability. Mark Hopkins, Department of Economics Math Department Colloquium Gettysburg College April 14, 2005. “Torture the data, and they will confess…”. Theory: Is data mining a dirty word?
E N D
Data, Models and theSearch for Exchangeability Mark Hopkins, Department of Economics Math Department Colloquium Gettysburg College April 14, 2005
“Torture the data, and they will confess…” • Theory: • Is data mining a dirty word? • Statistics vs. econometrics and the role of the ex ante theory • Information extraction amounts to a conditioning problem • Conditioning: bias vs. variance, or a search for exchangeability…? • Propagating “model uncertainty” into our parameter estimates • Using new Bayesian statistical methods in econometrics • What do economists have to learn from statisticians? • Application: • Why do some countries become rich faster than others?
Preliminaries: Recalling Bayes’ Rule • Bayes’ Rule tells us how we can update our beliefs (about event A) given some data (knowledge that event B happened) • Example: What is the probability that Saddam had weapons of mass destruction (WMD), given that none have been found (NF)? • The answer depends both on the “strength of the data” p(NF|WMD) and one’s own (subjective) prior beliefs about p(WMD) • The statistician's job is (should be) to help you update your own personal beliefs… all truth is “subjective” in a Bayesian world
Prior beliefs modify our view of the information contained in “data”
Statistical Inference: A Review • The goal: observe the world (gather data, D) and then draw conclusions and/or make predictions • This requires a theory (or model, M) to organize relationships • Mathematics (Probability Theory) • A statistical model is simply a probability distribution, p(D|M), where M {,A}consists of • A set of structural assumptions (A), and • some vector () parameterizing the probability distribution. This usually represents the “question of interest”: e.g. {,2} • Statistical inference: • “Drawing conclusions” refers to p(|D,A) • “Making predictions” refers to p(Dnew|,A)
Estimating p(|D,A):Two Practical (& Related) Problems #1: Inference about is conditional on model assumptions • In practice, we don’t know the true structural assumptions (A) • What do we know? Bayes Rule: p(M |D) p(D|M)p(M) • Hypothesis testing can reject a model, but it can neither confirm it nor tell you the correct alternative! • Statistics vs. econometrics: what role does the prior p(M) play? • Traditional statistics recognizes uncertainty about but not A. • Result: run a specification search for A, but pretend you didn’t! #2: What if data are not drawn from the same distribution? • Inference about is based on averaging repeated draws • A fundamental statistical issue: “We are each a population of 1!” • A methodological guide for “”: conditional exchangeability
The Conditioning Problem: A Familiar Example • Data D = {X,Y}; we want to know the “effect of X on Y” • We are interested in the regression (or C.E.F.): E[Y|X] • Define the residual or “error” as: Y – E[Y|X] • Familiar Linear Example: model M is E[Y|X] = 0 + 1X • so Y=0 + 1X + • Estimation / inference: • Estimation: find {0,1} that minimize some loss function L( ) • Inference: conditional on our information set , must be exchangeable
The Benefits of Using the Bayesian Approach of “Exchangeability” • Classical (Frequentist) “i.i.d.”vs. Bayesian “exchangeability” • A foundation for statistical inference on population data • DeFinetti’s Representation Theorem states… • If a sample {X1, X2,…,Xn} is a subset of an infinite exchangeable sequence, {X}, then it is “as if ” p(D |,A) exists, where ~p( ) • Clarifies the goal of conditioning / model search process • We are trying to achieve “anonymity” of regression residuals • Clarifies the relationship between model search and prediction • What is the basis for using the past to make predictions of the future? … when the past and future are part of an exchangeable sequence!
Example of a Conditioning Problem:The Sources of Economic Growth • Why have some countries grown richer faster than others do? • Data (D): growth rates (g) & assorted country characteristics (X) • Observations are countries (n 100) • Ex ante theory: The Solow Model of Capital Accumulation • The Problem: What about other variables that may affect g ? • Omitted variable bias & “robustness” problems • D.o.F. problem: # Theories > # Observations … (plus multicollinearity!) • Specifying functional forms for variables like democracy, ethnic diversity • Population heterogeneity… Are France, Taiwan, and Sudan really all “draws from the same distribution”? Inference about 2…?
Exchangeability in Cross-Country Growth Regressions • Inference requires conditional exchangeability • France, Taiwan, and Sudan are not exchangeable, but can we find appropriate vector X such that g – E[g|X] are exchangeable? • Conditioning just boils down to a problem of model selection! • The classical approach to model selection is “hypothesis testing” • However, D.o.F. problem has led to upward “specification search”! • In summary: • Two types of uncertainty: sampling (variance), model (bias) • Model Selection usually involve an artful trade-off of bias vs. variance • However, classical methods do not propagate our model uncertainty into coefficient estimates • Can Bayesian statistics help us bring science to the art of selection?
The Growth Literature, Take 1:OLS estimates w/ controls & dummies
The Growth Literature, Take 2:“Explaining” Parameter Heterogeneity • Tree Regressions • Local Linear Regressions (Spline models) • Varying Coefficient / Hierarchical Models
A Tree Regression s60<0.095 | EQINV<0.0144 laam<0.5 s60<0.03 -0.0072 0.0040 0.0159 NONEQINV<0.1624 DEMOC65<0.8435 FRAC<0.155 0.0213 0.0130 0.0068 EQINV<0.04949 EQINV<0.05405 lny60<8.49696 0.0170 0.0330 0.0532 0.0390 0.0250
Specification Searches A specification search is a search for the mode of P(M |D)… • Bayes Rule: • Problem #1: How strong is your prior belief about M? • Problem #2: Can you characterize your prior beliefs? • Problem #3: Using the same data to find M and to estimate ? • Danger! Why? • Problem #4: By conditioning model on M [not p(M) ], you are understating uncertainty about coefficient estimates!
Bayesian Model Averaging (BMA) • An alternative to trying to find the single best model (i.e., the mode of p(M) – is to consider the entire distribution of specifications… • Suppose you assign probability p(Ak) to K specifications, then • Averaging over model space improves statistical inference • Coefficient estimates tend to have better predictive ability • Standard errors reflect model, as well as parametric uncertainty
Some nasty theoretical details • Choosing the space of models and model priors • Managing summation in BMA can be tricky…with 12 possible covariates, there are 212 = 4,096 different models to combine! • “Occam’s Window” suggested by Rafferty (1994): eliminate larger and/or less probable models • MC3 techniques transit across model space. Compute p(,A) from p(|A) and p(A|D) • Computing the integral p(D|A) = p(D|,A)p(|A)d • This is done directly in MC3 techniques for BMA, otherwise… • Can approximate using p(D| MLE,A)
Conclusions • Standard statistical inference is conditional on the chosen model • A data-driven model search is usually an unavoidable fact of life • Model must include appropriate vector of controls (bias vs. variance) • Model should address parameter heterogeneity and functional form • A methodological guide for conditioning is exchangeability • Of course, the very fact that we are searching for a model means we are really less certain about our estimates that we are stating… • BMA techniques help to “propagate model uncertainty” into coefficient estimates and standard errors