1 / 22

A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space

A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space. Paper by E. P. Xing and K-A. Sohn. Presented by Chunping Wang Machine Learning Group, Duke University February 26, 2007. Outline. Terminology and Introduction

faolan
Download Presentation

A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Paper by E. P. Xing and K-A. Sohn Presented by Chunping Wang Machine Learning Group, Duke University February 26, 2007

  2. Outline • Terminology and Introduction • DP Mixtures for Non-recombination Inheritance • HMDP for Recombination • Results • Conclusions

  3. Terminology and Introduction (1) • Allele: a viable DNA coding on a chromosome – observation • Locus : the location of an allele – index of an observation • Haplotype: a sequence of alleles – data sequence • Recombination: exchange pieces of paired chromosome – state-transition • Mutation: any change to a haplotype during inheritance – emission

  4. Terminology and Introduction (2) Ancestors Descendants

  5. Terminology and Introduction (3) Problems: 1. Ancestral inference: recovering ancestral haplotypes; 2. Recombination analysis: inferring the recombination hotspots; 3. Ancestral mapping: inferring the ancestral origin of each allele in each modern haplotype.

  6. DP Mixtures for Non-recombination Inheritance (1) • Non-recombination: • Only mutation may occur during inheritance; • Each modern haplotype is originated from a single ancestor. • Only true for haplotypes spanning a short region in a chromosome.

  7. DP Mixtures for Non-recombination Inheritance (2) where , the distinct values of , denote the joint of the kth ancestor and the mutation parameter corresponding to the kth ancestor.

  8. DP Mixtures for Non-recombination Inheritance (3)

  9. HMDP for Recombination (1) For long haplotypes possibly bearing multiple ancestors, we consider recombinations (state-transitions across discrete space-interval).

  10. HMDP for Recombination (2) • Each row of the transition matrix in HMM is a DP. Also these DPs are linked by the top level master DP, and have the same set of target states. • The mixing proportions for each lower level DP are denoted as , then the jth row of the transition matrix is .

  11. HMDP for Recombination (3) Modern haplotype Ancestor haplotype The indicators of ith modernhaplotype for all the loci, which specify the corresponding ancestral haplotype • when no recombination takes place during the inheritance process producing haplotype Hi, • when a recombination occurs between loci t and t+1,

  12. HMDP for Recombination (4) Introduce a Poisson point process to control the duration of non-recombinant inheritance (space-inhomogeneous) x-the number of recombinations Denote d: the physical distance between loci t and t+1 ; r: recombination rate per unit distance. Then

  13. HMDP for Recombination (5) Combine with the standard stationary HMDP, the non-stationary state transition probability: While d or r goes to infinity, , , the inhomogeneous HMDP model goes back to a standard HMDP.

  14. HMDP for Recombination (6) Inference: The prior base: uniform The emission function: Integrate over , the marginal likelihood: where

  15. HMDP for Recombination (7) Inference: Combine the HDP prior and the marginal likelihood, we can infer the posterior for and , which are the variables of interest. • Two sampling stages: • Sample given all haplotypes h and the most recently sampled ancestor pool a; • Sample every ancestor Akgiven all haplotypes h and the current

  16. Results (1) Simulated data: 30 populations, each includes 200 haplotypes from K=5 ancestral haplotypes. T=100 Compare: HMDP, HMMs with K=3,5 and 10 The average ancestor reconstruction errors for the five ancestors Even the HMM with K=5 cannot beat the HMDP

  17. Results (2) The vertical gray lines - the pre-specified recombination hotspots Threshold 2 Threshold 1 Box plot of the empirical recombination rates

  18. Results (3) Population maps: 1. true map; 2. HMDP; 3-5. HMMs with K=3,5,10 Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype. Measure for accuracy: the mean squared distance to the true map

  19. Results (4) Real haplotype data sets 1: Daly data – single population 512 haplotypes. T=103 Bottom: empirical recombination rates Upper vertical lines: recombination hotspots. Red dotted lines: HMM; blue dashed lines: MDL; black solid lines: HMDP

  20. Results (5) Choose the threshold A Gaussian mixture fitting of empirical recombination rates

  21. Results (6) Estimated population map Each vertical thin line – one modern haplotype; Each color – one ancestral haplotype.

  22. Conclusions • This HMDP model is an application and extension of the HDP into the population genetics field; • The HDP allows the space of states in HMM to be infinite so that it is suitable for inferring unknown number of ancestral haplotypes; • The HMDP model also allows the recombination rates to be non-stationary; • The HMDP model can jointly infer a number of important genetic variables.

More Related