MCQMC 2012

From inference to modelling to algorithms and back againKerrie MengersenQUT Brisbane MCQMC 2012

Acknowledgements: BRAG Bayesian methods and models+ Fast computation + Applications in environment, health, biology, industry

So what’s the problem? Inferential need Model Algorithm

Matchmaking 101 Inferential need Model

Study 1: Presence/absence models Sama Low Choy Mark Stanaway

Plant biosecurity

Observations and data Visual inspection symptoms Presence / absence data Space and time Dynamic invasion process Growth, spread Inference Map probability of extent over time Useful scale for managing trade / eradication Currently use informal qualitative approach Hierarchical Bayesian model to formalise the information From inference to model

Hierarchical Bayesian model for plant pest spread • Data Model: Pr(data | incursion process and data parameters) • How data is observed given underlying pest extent • Process Model: Pr(incursion process | process parameters) • Potential extent given epidemiology / ecology • Parameter Model: Pr(data and process parameters) • Prior distribution to describe uncertainty in detectability, exposure, growth … • The posterior distribution of the incursion process (and parameters) is related to the prior distribution and data by: Pr(process, parameters | data)  Pr(data | process, parameters ) Pr( process | parameters ) Pr(parameters)

Early Warning Surveillance • Priors based on emergency plant pest characteristics • exposure rate for colonisation probability • spread rates to link sites together for spatial analysis • Add surveillance data • Posterior evaluation • modest reduction in area freedom • large reduction in estimated extent • residual “risk” maps to target surveillance

Observation Parameter Estimates Taking into account invasion process Hosts Host suitability Inspector efficiency Identify contributions

Clair Alston Study 2: Mixture models

CAT scanning sheep

From inference to model What proportions of the sheep carcase are muscle, fat and bone? • Finite mixture model yi ~ Slj N(mj,sj2) • Include spatial information

Inside a sheep

Study 3: State space models Nicole White

Parkinson’s Disease

PD symptom data • Current methods for PD subtype classification rely on a few criteria and do not permit uncertainty in subgroup membership. • Alternative: finite mixture model (equivalent to a latent class analysis for multivariate categorical outcomes) • Symptom data:Duration of diagnosis, early onset PD, gender, handedness, side of onset

From inference to model yij: ith subject’s response to item j 1. Define a finite mixture model based on patient responses to Bernoulli and Multinomial questions. 2. Describe subgroups w.r.t. explanatory variables 3. Obtain patient’s probability of class membership

PD: Symptom data

PD Signal data:“How will they respond?”

Inferential aims Identify spikes and assign to unknown no.source neurons Compare clusters between segments within a recording and between recordings at different locations of the brain 3 depths

Microelectrode recordings Each recording was divided into 2.5sec. segments Discriminating features foundvia PCA

From inference to model DP Model yi | qi ~ p(yi | qi) qi ~ G G ~ DP(a, G0) P PCs, yi=(yi1,..,yiP) ~ MVN(m,S) G0 = p(m|S) p(S) a ~ Ga(2,2)

Average waveforms

Study 4: Spatial dynamic factor models Chris Strickland Ian Turner What can we learn about landuse from MODIS data?

Differentiate landuse SDFM • 1st factor has influence on temporal dynamics in right half of image (woodlands) • 3rd factor has influence on LH image (grasslands) 1st trend component 2nd trend comp. common cyclical comp.

Matchmaking 101 Inferential need Model smart models

Smart models Tailoring Generalisation Blocking Reparametrisation Reformulation

Example 1: Generalisation Mixtures are greatbut how do we choose k? Propose an overfitting model (k>k0) Non-identifiable! All values of q = (p10,..,pk00, 0, g10,..,gk00) and all values of q = (p10,..,pj,…,pk00, pk+1, g10,..,gk00, gj0) with pj+pk+1=pj0 fit equally well. Judith Rousseau f0(x) = Sj=1,..,k0 pj ggj(x)

So what? • Multiplicity of possible solutions => MLE does not have a stable asymptotic behaviour. • Not important when fq is the main object of interest, but important if we want to recover q. • It thus becomes crucial to know that the posterior distribution under overfitted mixtures give interpretable results

Possible alternatives to avoid overfitting Fruhwirth-Schnatter (2006): either one of the component weights is zero or two of the component parameters are equal. • Choose priors that bound the posterior away from the unidentifiability sets. • Choose priors that induce shrinkage for elements of the component parameters. Problem: may not be able to fit the true model

Our result Assumptions: • L1 consistency of the posterior • Model g is three times differentiable, regular, and integrable • Prior on Q is continuous and positive, and the prior on (p1,..,pk) satisfies p(p)  p1a1-1…pkak-1

Our result - 1 • If max(a1)<d/2, where d=dim(g), then asymptotically f(q|x) concentrates on the subset of parameters for which fq = f0, so k-k0 components have weight 0. • The reason for this stable behaviour as opposed as the unstable behaviour of the maximum likelihood estimator is that integrating out the parameter acts as a penalization: the posterior essentially puts mass on the sparsest way to approximate the true density.

Our result - 2 • In contrast, if min(aj, j≤k)>d/2 and k>k0, then 2 or more components will tend to merge with non-neglectable weights each. This will lead to less stable behaviour. • In the intermediate case, if min(aj, j≤k) ≤d/2 ≤max(aj,j ≤k), then the situation varies depending on the aj’s, and on the difference between k and k0.

Implications: Model dimension • When d/2>max{aj, j=1,..,k},dk0+k0-1+Sj≥k0+1aj appears as an effective dimension of the model • This is different from the number of parameters, dk+k-1, or from other “effective number of parameters” • Similar results are obtained for other situations

Example 1 yi ~ N(0,1); fit pN(m1,1)+(1-p)N(m2,1)ai=1 > d/2

Example 2yi ~ N(0,1) G=pN2(m1,S1)+(1-p)N2(m2, S2), Sj diagonald = 3; a1=a2=1 < d/2

Conclusions • The result validates the use of Bayesian estimation in mixture models with too many components. • It is one of the few examples where the prior can actually have an impact asymptotically, even to first order (consistency) and where choosing a less informative prior leads to better results. • It also shows that the penalization effect of integrating out the parameter, as considered in the Bayesian framework is not only useful in model choice or testing contexts but also in estimating contexts.

Example 2: Empirical likelihoods Christian Robert • Sometimes the likelihood associated with the data is not completely known or cannot be computed in a manageable time (eg population genetic models, hidden Markov models, dynamic models), so traditional tools based on stochastic simulation (eg, regular MCMC) are unavailable or unreliable. • Eg, biosecurity spread model.

Model alternative: ELvIS • Define parameters of interest as functionals of the cdf F (eg moments of F), then use Importance Sampling via the Empirical Likelihood. • Select the F that maximises the likelihood of the data under the moment constraint. • Given a constraint of the form Ep(h(Y)) = q the EL is defined as Lel(q|y) = maxFPi=1:n{F(yi)-F(Yi-1} • For example, in the 1-D case when q = Eq(Y) the empirical likelihood in q is the maximum of p1,…,pn under the constraint Si=1:npiyi = q

Quantile distributions • A quantile distribution is defined by a closed-form quantile function F-1(p;q) and generally has no closed form for the density function. • Properties: very flexible, very fast to simulate (simple inversion of the uniform distribution). • Examples: 3/4/5-parameter Tukey’s lambda distribution and generalisations; Burr family; g-and-k and g-and-h distributions.

g-and-k quantile distribution

Methods for estimating a quantile distribution • MLE using numerical approximation to the likelihood • Moment matching • Generalised bootstrap • Location and scale-free functionals • Percentile matching • Quantile matching • ABC • Sequential MC approaches for multivariate extensions of the g-and-k

ELvIS in practice • Two values of q=(A,B,g,k): q=(0,1,0,0) standard normal distributionq=(3,2,1,0.5) Allingham’s choice • Two priors for q: U(0,5)4 A~U(-5,5), B~U(0,5), g~U(5,5), k~(-1,1) • Two sample sizes: n=100 n=1000

ELvIS in practice: q=(3,2,1,0.5), n=100

Matchmaking 101 Model Algorithm

A wealth of algorithms! MC MCMC IS SMC ABC QMC VB

From model to algorithm Chris Strickland Models: • Logistic regression • Non-Gaussian state space models • Spatial dynamic factor models Evaluate: • Computation time • Maximum bias • sd • Inefficiency factor (IF) • Accuracy rate

MCQMC 2012

MCQMC 2012

Presentation Transcript

FRONTERA 2012 BORDER 2012

2012-2/2012-3

2012

2012

2012

2012

2012

2012

2012

2012

2012

2012

2012

2012