Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture Paper by E. Xing, K. Sohn, M. Jordan and Y. Teh, ICML 2006 Duke University Machine Learning Group Presented by Kai Ni August 24, 2006

Outline • Background • Dirichlet Processe mixture • Hierarchical Dirichlet Process mixture • Application on haplotype inference

Motivation • Problem – Uncovering the haplotypes of single nucleotide polymorphisms (SNP) within and between populations. • Methods – Coalescence, finite and infinite mixtures, and maximal parsimony. • Application • Biological and medical analysis; • Genetic demography study.

Background • A SNP haplotype is a list of alleles at contiguous sites in a local region of a single chromosome. A haplotype is inherited as a unit. • For diploid organisms, two haplotypes go together to make up a genotype, which is a list of unordered pairs of alleles in a region. • Haplotype inference from genotype data can be formulated as a mixture model. HDP mixture is used in this paper.

Dirichlet Processes • A single clustering problem can be analyzed as a Dirichlet processes (DP).

DP mixture model • G can be looked as an mixture model with infinite components.

DP-Haplotyper • denotes the genotype of T contiguous SNPs of individual i from ethnic group j. • The corresponding paternal/maternal haplotypes of the individual genotype is denoted by • H is assume to be a random perturbation of an ancestral haplotype A, or founder. • DP-Haplotyper is a DP mixture model to model a single population group.

Graph model of DP-Haplotyper

Hierarchical Dirichlet Process • Each group is modeled as a DP Gj and the group-specific DPs are linked via a global DP G0. • G0 defines the set of mixture components used by all the groups. Different groups share the same set of mixture components (underlying clusters ), but with different mixture proportions.

HDP mixture model • HDP can be used as the prior distribution over the factors for nested group data. • Consider a two-level DPs. G0 links the child Gj DPs and forces them to share components. Gj is conditionally independent given G0

HDP – Chinese Restaurant Franchise • First level: within each group, DP mixture • Φj1,…,Φj(i-1), i.i.d., r.v., distributed according to Gj; Ѱj1,…, ѰjTj to be the values taken on by Φj1,…,Φj(i-1), njk be # of Φji’= Ѱjt, 0<i’<i. • Second level: across group, sharing clusters • Base measure of each group is a draw from DP: • Ө1,…, ӨK to be the values taken on by Ѱj1,…, ѰjTj , mk be # of Ѱjt=Өk, all j, t.

HDP-Haplotyper model

Parameterization form of the model • Underlying mixture component Ak := [Ak,1, … , Ak,T] – founding haplotype configuration • Base measure , where p(A) is uniform distribution and p( ) is a beta distribution. • Inheritance model • Genotyping model

Gibbs Sampling • Gibbs sampling variants includes: • Sampling scheme is similar to a two-level urn model:

Simulated data • 100 individuals from 5 groups (20 each). Each group has 2 shared founders and 3 unique founders, in a total of 17 founders.

Real data • International HapMap Project, containing four population of genotypes.

Conclusion • The author proposed a HDP mixture model for haplotype inference for multiple populations. • HDP prior couples multiple heterogeneous populations and facilitates sharing mixture components across multiple infinite mixture models. • In the future, longer SNP sequences will be considered. Also HDP can be generalized to the problem in which the group labels are unknown and to be inferred.

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture

Presentation Transcript

Collapsed Variational Dirichlet Process Mixture Models

Hierarchical Dirichlet Process and Infinite Hidden Markov Model

Hierarchical Dirichlet Process (HDP)

Variational Inference for Dirichlet Process Mixture

Bayesian Inference

Bayesian Haplotype Inference for Multiple Linked Single Nucleotide Polymorphisms

Hierarchical Dirichlet Processes

Hierarchical Dirichlet Process (HDP)

Bayesian inference

Bayesian Inference

Bayesian Inference

Bayesian inference

Bayesian Inference

Hierarchical Double Dirichlet Process Mixture of Gaussian Processes

Memoized Online Variational Inference for Dirichlet Process Mixture Models

Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation

Bayesian inference

Bayesian Inference

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

Document Clustering via Dirichlet Process Mixture Model with Feature Selection

Bayesian Inference

Hierarchical Dirichlet Process and Infinite Hidden Markov Model