Towards Likelihood Free Inference

Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane a.pettitt@qut.edu.au Joint work with Rob Reeves

Outline • Some problems with intractable likelihoods. • Monte Carlo methods and Inference. • Normalizing constant/partition function. • Likelihood free Markov chain Monte Carlo. • Approximating Hierarchical model • Indirect Inference and likelihood free MCMC • Conclusions.

Stochastic models (Riley et al, 2003) Macroparasite within a host. Juvenile worm grows to adulthood in a cat. Host fights back with immunity. Number of Juveniles, Adults and amount of Immunity (all integer). evolve through time according to Markov process unknown parameters, eg Juvenile → Adult rate of maturation Immunity changes with time Juveniles die due to Immunity Moment closure approximations for distribution of limited to restricted parameter values.

Numerical computation of limited by small maximum values of J, A, I. Can simulate process easily. Data: J at t=0 and A at t (sacrifice of cat), replicated with several cats Source: Riley et al, 2003.

Other stochastic process models include spatial stochastic expansion of species (Hamilton et al, 2005; Estoup et al, 2004) birth-death-mutation process for estimating transmission rate from TB genotyping (Tanaka et al, 2006) population genetic models, eg coalescent models (Marjoram et al 2003) Likelihood free Bayesian MCMC methods are often employed with quite precise priors.

Normalizing constant/partition function problem. The algebraic form of the distribution for y is known but it is not normalized, eg Ising model For means neighbours (on a lattice, say). The normalizing constant involves in general a sum over terms. Write

N-S and E-W neighbourhood

Monte Carlo methods and Inference. Intractable likelihood, instead use easily simulated values of y. Simulated method of moments (McFadden, 1989). Method of estimation: comparing theoretical moments or frequencies with observed moments or frequencies. Can be implemented using a chi-squared goodness-fit-statistic, eg Riley et al, 2003. Data: number of adult worms in cat at sacrifice.

Plot of goodness-of-fit statistic versus parameter. Greedy Monte Carlo. Precision of estimate? Source: Riley et al 2003.

3. Normalizing constant/partition function and MCMC (half-way to likelihood free inference) Here we assume (Møller, Pettitt, Reeves and Berthelsen, 2006) Key idea Importance sample estimate of given by Sample .

Used off-line to estimate then carry out standard Metropolis- Hastings with interpolation over a grid of values.( eg Green and Richardson, 2002, in a Potts model). Standard Metropolis Hastings: Simulating from target distribution Acceptance ratio for changing accepted with probability . Key Question: Can be calculated on-line or avoided?

On-line algorithm – single auxiliary variable method. Introduce auxiliary variable x on same space as y and extend target distribution for the MCMC KeyQuestion: How to choose distribution of x so that removed from Now acceptance ratio is as a new pair proposed. Proposal becomes . Assume the factorisation Choose the proposal so that Then algebra → cancellation of and does not depend on

Note: Need perfect or exact simulation from for the proposal. Key Question: How to choose , the auxiliary variable distribution? The best choice

Choice (i)

Choice (ii) Choice (ii)

Choice (i) Fix , say at a good estimate of . Then so does not depend on only y and cancels in . Choice (ii) Eg Partially ordered Markov mesh model for Ising data Comment Both choices can suffer from getting stuck because can be very different from the ideal .

Source: Møller et al, 2006 Single auxiliary method tends to get stuck Murray et al (2006) offer suggestions involving multiple auxiliary variables

4. Likelihood free MCMC Single Auxiliary Variable Method as almost Approximate Bayesian Computation (ABC) We wish to eliminate or equivalently , the likelihood from the M-H algorithm. Solution: The distribution of x given y and puts all probability on y, the observed data, then with the likelihood This might work for discrete data, sample size small, and if the proposal were a very good approximation to . If sufficient statistics s(y) exist then

Likelihood free methods, ABC- MCMC Change of notation, observed data (fixed), y is pseudo data or auxiliary data generated from the likelihood . Instead of , now have y close to in the sense of statistics s( ), distance ABC allows rather than equal to 0 Target distribution for variables Standard M-H with proposals (Marjoram et al 2003; ABC MCMC) for acceptance of . Ideally e should be small but this leads to very small acceptance probabilities.

Issues of implementing Metropolis-Hastings ABC (a) Tune for e to get reasonable acceptance probabilities; (b) All satisfying (hard) accepted with equal probability rather than smoothly weighted by (soft). (c) Choose summary statistics carefully if no sufficient statistics

Tune for e A solution is to allowetovary as a parameter (Bortot et al, 2004). The target distribution is Run chain and post filter output for small values of e

Beaumont, Zhang and Balding (2002) use kernel smoothing in ABC-MC

Approximating Hierarchical Model

Some points • How could approximate posterior be made more precise? • Use more parameters in approximating likelihood, the POMM? (Gouriéroux at al (1993), Heggland and Frigassi (2004) discuss this in the frequentist setting) • More iterations for side chain “exact” calculation of approximate posterior? • How to choose a good approximating likelihood? • Relationship to summary statistics approach?

Conclusions • For the normalizing constant problem we presented a single on-line M-H algorithm. • We linked these ideas to ABC-MCMC and developed a hierarchical model (HM) to approximate the true posterior – showed variance inflation. • We showed that the approximating HM could be tempered swaps made to improve mixing using parallel chains, variance inflation effect corrected by smoothing posterior summaries from the tempered chains. • We extended indirect inference to an HM to find a way of implementing the Metropolis Hastings algorithm which is likelihood free. • We demonstrated the ideas with the Ising/autologistic model. • Application to specific examples is on-going and requires refinement of general approaches.

Acknowledgements Support of the Australian Research Council Co-authors Rob Reeves, Jesper Møller, Kasper Berthelsen Discussions with Malcolm Faddy, Gareth Ridall, Chris Glasbey, Grant Hamilton …

Towards Likelihood Free Inference

Towards Likelihood Free Inference

Presentation Transcript

Phylogenetics 4 Maximum Likelihood and Bayesian phylogenetic inference

Maximum Likelihood

TOWARDS A FREE TELECOM MARKET

Maximum Likelihood

Towards Tobacco-Free Singapore

Lecture 1 Bayesian inference and maximum likelihood

§❶ Review of Likelihood Inference

Likelihood methods

Likelihood, Inference, and Model Comparison

Maximum Likelihood

Maximum likelihood

Simulated-likelihood-based Inference for an outbreak of influenza

Empirical Likelihood

Maximum Likelihood

Maximum Likelihood

Likelihood

Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference

Likelihood

Maximum Likelihood

The Journey Towards Smoke Free

Empirical Likelihood

Parsimony, Likelihood, Common Causes, and Phylogenetic Inference