Applied Bayesian Inference for Agricultural Statisticians

Applied Bayesian Inference for Agricultural Statisticians Robert J. Tempelman Department of Animal Science Michigan State University

Outline of talk: • Introduction • Review of Likelihood Inference • An Introduction to Bayesian Inference • Empirical Bayes Inference • The Bayesian Connection to Generalized Linear Mixed Models (GLMM) • The Bayesian Revolution: • Markov Chain Monte Carlo (MCMC) methods • Metropolis Hastings sampling • Comparisons of Bayesian with conventional GLMM analyses of agricultural experiments. • Extensions of GLMM analyses of agricultural experiments using hierarchical Bayesian inference.

Warning • You won’t learn Bayesian data analysis in one day…and there is still a lot I don’t know. • Some great resources for agricultural statisticians/data analysts Sorensen and Gianola Carlin and Louis Gelman et al.

How did I get interested in Bayesian Statistics? • 1986-1989: Masters of Science in Animal Breeding, University of Guelph • Additive & Dominance Genetic Variation for milk yield in dairy cows Y = Xb+ Zaua + Zdud+ e; enx1~ N (0,Is2e); ua(qx1) ~ N (0,As2a); ud(qx1) ~ N (0,Ds2d); Ynx1 q > n A, D: known correlation matrices b: Fixed effects ua, ud: Random effects

Inference issues • What was known: • E(u|b=GLS(b),s2a,s2d,s2e,y) was Best Linear Unbiased Predictor (BLUP) of u. • But typically, don’t know variance components (VC): s2a, s2d, s2e • Default: Use REML (Restricted Maximum Likelihood) to estimate VC. • E(u|b=E-GLS(b), REML (s2a, s2d , s2e),y) is EmpiricalBest Linear Unbiased Predictor (E-BLUP). • Use Henderson’s Mixed Model Equations to get this. • What are properties of E-BLUP, E-GLS based on REML estimates of VC?....don’t ask, don’t tell.

An even potentially bigger inference issue • Generalized linear mixed models (GLMM). • i.e., for the analysis of non-normal data? • Binary, Count, etc. • Inference in GLMM analyses is often asymptotic(based on behavior in “large samples”) → even when VC are known. • What are the implications if VC are unknown?

From last year’s KSU workshop (Walt Stroup) • Generalized linear mixed models: What’s really important • Probability distributions • For non-normal data, the model equation form Y = Xb + Zu + eis not useful..it’s counterproductive. • Formal tests are based on asymptotic (“large sample”) approximations. • Nice properties when n is “large” When is n large enough? • Quasi-likelihood (PROC GENMOD)…what’s that? • “vacuous” (Walt Stroup) for repeated measures specs.→ can’t even simulate data generation process. • Can we do better? I think so.

Fall, 1989: Phd Program University of Illinois- here I come! Journal of Animal Science, 1986

My motivation for learning/understanding Bayesian statistics? • Pragmatic, not philosophical. • Animal breeders are incredibly eclectic…they just want to solve problems in animal genetics! • “Physicists and engineers very often become immersed in the subject matter. In particular, they work hand in hand with neuroscientists and often become experimentalists themselves. Furthermore, engineers (and likewise computer scientists) are ambitious; when faced with problems, they tend to attack, sweeping aside impediments stemming from limited knowledge about the procedures that they apply” • From “What is Statistics” by Brown and Kass (2009) in The American Statistician. • This is also the culture of statistical genetics /genomics/ animal breeding….and is the culture of data analysts.

Bayesian statistics • Why the fuss…its philosophy is so messy? • We’ve been doing things ok already…right? • What’s wrong with our current toolkit? • Linear mixed models (LMM). • Nothing really for classical assumptions. • Generalized linear mixed models (GLMM) • Depends on the distribution…binary’s the worst to deal with. • Nonlinear mixed models. (NLMM) • Not much wrong for classical assumptions and n is “large enough” • Won’t address in this workshop.

The real issues? • 1. Asymptotics • Likelihood inference is often based on approximations. • “large n” is really more than sample size. • Depends on p. • Depends on data distribution (e.g. binary vs. cont.). • Depends on model complexity (i.e., design). • 2. Flexibility. • Can we go beyond the (G)(N)LMM?

Applied Bayesian Inference for Agricultural Statisticians

Applied Bayesian Inference for Agricultural Statisticians

Presentation Transcript

Bayesian Inference

Bayesian Inference!!!

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Agricultural Statisticians Network

Bayesian inference

Bayesian inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Bayesian Inference

Bayesian inference