  3. Settings • Conditional and unconditional log likelihoods • Likelihood function to be maximized contains unobservables • Integration techniques • Bayesian estimation • Prior times likelihood is intractible • How to obtain posterior means, which are open form integrals • The problem in both cases is “…how to do the integration?”

  4. A Conditional Log Likelihood

  5. Application - Innovation • Sample = 1,270 German Manufacturing Firms • Panel, 5 years, 1984-1988 • Response: Process or product innovation in the survey year? (yes or no) • Inputs: • Imports of products in the industry • Pressure from foreign direct investment • Other covariates • Model: Probit with common firm effects • (Irene Bertschuk, doctoral thesis, Journal of Econometrics, 1998)

  6. Likelihood Function • Joint conditional (on ui) density for obs. i. • Unconditional likelihood for observation i • How do we do the integration to get rid of the heterogeneity in the conditional likelihood?

  7. Obtaining the Unconditional Likelihood • The Butler and Moffitt (1982) method is used by most current software • Quadrature • Works for normally distributed heterogeneity

  8. Hermite Quadrature

  9. Example: 8 Point Quadrature Nodes for 8 point Hermite Quadrature Use both signs, + and - 0.381186990207322000, 1.15719371244677990 1.98165675669584300 2.93063742025714410 Weights for 8 point Hermite Quadrature 0.661147012558199960, 0.20780232581489999, 0.0170779830074100010, 0.000199604072211400010

  10. Butler and Moffitt’s Approach Random Effects Log Likelihood Function

  11. Monte Carlo Integration

  12. The Simulated Log Likelihood

  13. Quasi-Monte Carlo Integration Based on Halton Sequences For example, using base p=5, the integer r=37 hasb0 = 2, b1 = 2, and b3 = 1. ThenH37(5) = 25-1 + 25-2 + 15-3 = 0.448.

  14. Panel Data Estimation A Random Effects Probit Model

  15. Log Likelihood

  16. (1.17072 / (1 + 1.17072) = 0.578)

  17. Quadrature vs. Simulation • Computationally, comparably difficult • Numerically, essentially the same answer. MSL is consistent in R • Advantages of simulation • Can integrate over any distribution, not just normal • Can integrate over multiple random variables. Quadrature is largely unable to do this. • Models based on simulation are being extended in many directions. • Simulation based estimator allows estimation of conditional means  essentially the same as Bayesian posterior means

  18. A Random Parameters Model

  19. Estimates of a Random Parameters Model

  20. RPM

  21. RPM

  22. Movie Model

  23. Parameter Heterogeneity

  24. Bayesian Estimators • “Random Parameters” • Models of Individual Heterogeneity • Random Effects: Consumer Brand Choice • Fixed Effects: Hospital Costs

  25. Bayesian Estimation • Specification of conditional likelihood: f(data | parameters) • Specification of priors: g(parameters) • Posterior density of parameters: • Posterior mean = E[parameters|data]

  26. The Marginal Density for the Data is Irrelevant

  27. Computing Bayesian Estimators • First generation: Do the integration (math) • Contemporary - Simulation: • (1) Deduce the posterior • (2) Draw random samples of draws from the posterior and compute the sample means and variances of the samples. (Relies on the law of large numbers.)

  28. Modeling Issues • As N , the likelihood dominates and the prior disappears  Bayesian and Classical MLE converge. (Needs the mode of the posterior to converge to the mean.) • Priors • Diffuse  large variances imply little prior information. (NONINFORMATIVE) • INFORMATIVE priors – finite variances that appear in the posterior. “Taints” any final results.

  29. A Random Effects Approach • Allenby and Rossi, “Marketing Models of Consumer Heterogeneity” • Discrete Choice Model – Brand Choice • “Hierarchical Bayes” • Multinomial Probit • Panel Data: Purchases of 4 brands of Ketchup

  30. Structure

  31. Bayesian Priors

  32. Bayesian Estimator • Joint Posterior= • Integral does not exist in closed form. • Estimate by random samples from the joint posterior. • Full joint posterior is not known, so not possible to sample from the joint posterior.

  33. Gibbs Sampling: • Target: Sample from f(x1, x2) = joint distribution • Joint distribution is unknown or it is not possible to sample from the joint distribution. • Assumed: f(x1|x2) and f(x2|x1) both known and samples can be drawn from both. • Gibbs sampling: Obtain one draw from x1,x2 by many cycles between x1|x2 and x2|x1. • Start x1,0 anywhere in the right range. • Draw x2,0 from x2|x1,0. • Return to x1,1 from x1|x2,0 and so on. • Several thousand cycles produces a draw • Repeat several thousand times to produce a sample • Average the draws to estimate the marginal means.

  34. Gibbs Cycles for the MNP Model • Samples from the marginal posteriors

  35. Results • Individual parameter vectors and disturbance variances • Individual estimates of choice probabilities • The same as the “random parameters model” with slightly different weights. • Allenby and Rossi call the classical method an “approximate Bayesian” approach. • (Greene calls the Bayesian estimator an “approximate random parameters model”) • Who’s right? • Bayesian layers on implausible priors and calls the results “exact.” • Classical is strongly parametric. • Neither is right – Both are right.

  36. Comparison of Maximum Simulated Likelihood and Hierarchical Bayes • Ken Train: “A Comparison of Hierarchical Bayes and Maximum Simulated Likelihood for Mixed Logit” • Mixed Logit

  37. Stochastic Structure – Conditional Likelihood Note individual specific parameter vector,i

  38. Classical Approach

  39. Bayesian Approach – Gibbs Sampling and Metropolis-Hastings

  40. Gibbs Sampling from Posteriors: b

  41. Gibbs Sampling from Posteriors: Ω

  42. Gibbs Sampling from Posteriors: i

  43. Metropolis – Hastings Method

  44. Metropolis Hastings: A Draw of i

  45. Application: Energy Suppliers • N=361 individuals, 2 to 12 hypothetical suppliers • X=[(1) fixed rates, (2) contract length, (3) local (0,1),(4) well known company (0,1), (5) offer TOD rates (0,1), (6) offer seasonal rates]

  46. Estimates: Mean of Individual i

  47. Reconciliation: A Theorem (Bernstein-Von Mises) • The posterior distribution converges to normal with covariance matrix equal to 1/N times the information matrix (same as classical MLE). (The distribution that is converging is the posterior, not the sampling distribution of the estimator of the posterior mean.) • The posterior mean (empirical) converges to the mode of the likelihood function. Same as the MLE. A proper prior disappears asymptotically. • Asymptotic sampling distribution of the posterior mean is the same as that of the MLE.

