480 likes | 487 Views
Econometrics. Chengyuan Yin School of Mathematics. Econometrics. 24. Simulation Based Estimation. Settings. Conditional and unconditional log likelihoods Likelihood function to be maximized contains unobservables Integration techniques Bayesian estimation
E N D
Econometrics Chengyuan Yin School of Mathematics
Econometrics 24. Simulation Based Estimation
Settings • Conditional and unconditional log likelihoods • Likelihood function to be maximized contains unobservables • Integration techniques • Bayesian estimation • Prior times likelihood is intractible • How to obtain posterior means, which are open form integrals • The problem in both cases is “…how to do the integration?”
Application - Innovation • Sample = 1,270 German Manufacturing Firms • Panel, 5 years, 1984-1988 • Response: Process or product innovation in the survey year? (yes or no) • Inputs: • Imports of products in the industry • Pressure from foreign direct investment • Other covariates • Model: Probit with common firm effects • (Irene Bertschuk, doctoral thesis, Journal of Econometrics, 1998)
Likelihood Function • Joint conditional (on ui) density for obs. i. • Unconditional likelihood for observation i • How do we do the integration to get rid of the heterogeneity in the conditional likelihood?
Obtaining the Unconditional Likelihood • The Butler and Moffitt (1982) method is used by most current software • Quadrature • Works for normally distributed heterogeneity
Example: 8 Point Quadrature Nodes for 8 point Hermite Quadrature Use both signs, + and - 0.381186990207322000, 1.15719371244677990 1.98165675669584300 2.93063742025714410 Weights for 8 point Hermite Quadrature 0.661147012558199960, 0.20780232581489999, 0.0170779830074100010, 0.000199604072211400010
Butler and Moffitt’s Approach Random Effects Log Likelihood Function
Quasi-Monte Carlo Integration Based on Halton Sequences For example, using base p=5, the integer r=37 hasb0 = 2, b1 = 2, and b3 = 1. ThenH37(5) = 25-1 + 25-2 + 15-3 = 0.448.
Quadrature vs. Simulation • Computationally, comparably difficult • Numerically, essentially the same answer. MSL is consistent in R • Advantages of simulation • Can integrate over any distribution, not just normal • Can integrate over multiple random variables. Quadrature is largely unable to do this. • Models based on simulation are being extended in many directions. • Simulation based estimator allows estimation of conditional means essentially the same as Bayesian posterior means
Bayesian Estimators • “Random Parameters” • Models of Individual Heterogeneity • Random Effects: Consumer Brand Choice • Fixed Effects: Hospital Costs
Bayesian Estimation • Specification of conditional likelihood: f(data | parameters) • Specification of priors: g(parameters) • Posterior density of parameters: • Posterior mean = E[parameters|data]
Computing Bayesian Estimators • First generation: Do the integration (math) • Contemporary - Simulation: • (1) Deduce the posterior • (2) Draw random samples of draws from the posterior and compute the sample means and variances of the samples. (Relies on the law of large numbers.)
Modeling Issues • As N , the likelihood dominates and the prior disappears Bayesian and Classical MLE converge. (Needs the mode of the posterior to converge to the mean.) • Priors • Diffuse large variances imply little prior information. (NONINFORMATIVE) • INFORMATIVE priors – finite variances that appear in the posterior. “Taints” any final results.
A Random Effects Approach • Allenby and Rossi, “Marketing Models of Consumer Heterogeneity” • Discrete Choice Model – Brand Choice • “Hierarchical Bayes” • Multinomial Probit • Panel Data: Purchases of 4 brands of Ketchup
Bayesian Estimator • Joint Posterior= • Integral does not exist in closed form. • Estimate by random samples from the joint posterior. • Full joint posterior is not known, so not possible to sample from the joint posterior.
Gibbs Sampling: • Target: Sample from f(x1, x2) = joint distribution • Joint distribution is unknown or it is not possible to sample from the joint distribution. • Assumed: f(x1|x2) and f(x2|x1) both known and samples can be drawn from both. • Gibbs sampling: Obtain one draw from x1,x2 by many cycles between x1|x2 and x2|x1. • Start x1,0 anywhere in the right range. • Draw x2,0 from x2|x1,0. • Return to x1,1 from x1|x2,0 and so on. • Several thousand cycles produces a draw • Repeat several thousand times to produce a sample • Average the draws to estimate the marginal means.
Gibbs Cycles for the MNP Model • Samples from the marginal posteriors
Results • Individual parameter vectors and disturbance variances • Individual estimates of choice probabilities • The same as the “random parameters model” with slightly different weights. • Allenby and Rossi call the classical method an “approximate Bayesian” approach. • (Greene calls the Bayesian estimator an “approximate random parameters model”) • Who’s right? • Bayesian layers on implausible priors and calls the results “exact.” • Classical is strongly parametric. • Neither is right – Both are right.
Comparison of Maximum Simulated Likelihood and Hierarchical Bayes • Ken Train: “A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit” • Mixed Logit
Stochastic Structure – Conditional Likelihood Note individual specific parameter vector,i
Application: Energy Suppliers • N=361 individuals, 2 to 12 hypothetical suppliers • X=[(1) fixed rates, (2) contract length, (3) local (0,1),(4) well known company (0,1), (5) offer TOD rates (0,1), (6) offer seasonal rates]
Reconciliation: A Theorem (Bernstein-Von Mises) • The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the posterior, not the sampling distribution of the estimator of the posterior mean.) • The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically. • Asymptotic sampling distribution of the posterior mean is the same as that of the MLE.