1 / 72

RECOMBINOMICS : Myth or Reality?

RECOMBINOMICS : Myth or Reality?. Laxmi Parida IBM Watson Research New York, USA. RoadMap. Motivation Reconstructability (Random Graphs Framework) Reconstruction Algorithm (DSR Algorithm) Conclusion. www.nationalgeographic.com/genographic. www.ibm.com/genographic.

Download Presentation

RECOMBINOMICS : Myth or Reality?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RECOMBINOMICS:Myth or Reality? Laxmi Parida IBM Watson Research New York, USA

  2. RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion

  3. www.nationalgeographic.com/genographic

  4. www.ibm.com/genographic

  5. Five year study, launched in April 2005 to address anthropological questions on a global scale using genetics as a tool • Although fossil records fix human origins in Africa, little is known about the great journey that took Homo sapiens to the far reaches of the earth. How did we, each of us, end up where we are? • Samples all around the world are being collected and the mtDNA and Y-chromosome are being sequenced and analyzed phylogeographic question

  6. DNA material in use under unilinear transmission 16000 bp 58 mill bp 0.38%

  7. Missing information in unilinear transmissions past present

  8. Table MountainCape Town, South Africa

  9. Paradigm Shift in Locus & Analysis Using recombining DNA sequences • Why? • Nonrecombining gives a partial story • represents only a small part of the genome • behaves as a single locus • unilinear (exclusively male of female) transmission • Recombining towards more complete information • Challenges • Computationally very complex • How to comprehend complex reticulations?

  10. RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm (DSR Algorithm) • Conclusion L Parida, Pedigree History: A Reconstructability Perspective using Random-Graphs Framework, Under preparation.

  11. The Random Graphs Framework GRAPH DEF: • Infinite number of verticesarranged in finite sized rows • Edges introduced via a random processacross immediate rows PROPERTIES: Address some topological questions • First, identify a Probability Space • Then, pose and address specific questions(such as expected depth of LCA etc..)

  12. The Random Graphs Framework Wright-Fisher Model • Constant population • Non-overlapping generations • Panmictic • Infinite number of verticeswith a specific organization • Edges introduced via a random processsatisfying specific rules • Address some topological questions • Define a Probability Space • Pose and answer specific questions(such as expected depth of LCA etc..)

  13. The Random Graphs Framework

  14. Properties of this Pedigree Graph • DAG Directed Acyclic Graph • |E| = O(|V|) for any finite fragment; sparse graph…Vertex-centric view.. • Focus on the flow of genetic material: relevant pedigree graph

  15. Pedigree Graph: GPG(K,N) • K no of extant units • 2N population size/generation • Can the model ignore color of vertex?

  16. Pedigree Graph: GPG(K,N) • K no of extant units • 2N population size/generation • Can the model ignore color of vertex? Forbidden Structure

  17. Probability Space • Space is non-enumerable • Uniform probability measure?WF pop • Probability of some event F(h) for a fixed depth, h, & take limit:

  18. Topological Property of GPG(K,N) Least Common Ancestor (LCA) of ALL (K) extant vertices------TMRCA or GMRCA------- • How many LCA’s ? • Expected Depth of the shallowest LCA

  19. Infinite No. of LCA’s in a GPG(4,3) instance ….. In fact, there exist infinite such instances!

  20. Topological Property of GPG(K,N) Least Common Ancestor (LCA)------TMRCA or GMRCA------- • How many LCA’s ? • Expected Depth of the shallowest “LCA”MEASURE OF RECONSTRUCTABILITY

  21. (Genetic Exchange) Sexual Reproduction vs Graph Model Ancestor without ancestry

  22. Graph Theory vis-à-visPopulation Genetics • Graph Theoretic (topological): • CA common ancestor • LCA Least CA or Shallowest CA MRCA Most Recent CATMRCA The MRCA • Graph Theoretic + Biology (Genetic Exchange): • CAA common ancestor-&-ancestry • LCAA Least CAAGMRCA Grand MRCA Unilinear Transmission

  23. Different Models as Subgraphs Pedigree Graph GPG(K,N)each vertex has 2 parents • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N)each vertex has 1 parent • MixedSubgraphGPGE(K,N,M)No of vertices/row no more than KMeach vertex has 1 OR 2 parentsM is no. of completely linked segs in each extant unit mtDNA Tree NRY Tree Genetic Exchange Model (ARG)

  24. Different Models GPG(4,8) GPGE(4,8,2) GPTY(4,8)

  25. Different Models as Subgraphs Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • MixedSubgraphGPGE(K,N,M) LCAg GMRCA LCA h TMRCA LCAg GMRCA

  26. GPGE(K,N,M) hARG • Ancestral Recombinations GraphGriffiths & Marjoram ‘97 • Embellish GPGE(K,N,M) with Genetic Exchanges (GE) • Each extant unit has M segments • No vertex with zero ancestral segments (to extant units)

  27. Mixed Subgraph GPGE(K,N,M) • Plausible GE assignment? • Can GPGE(K,N,M)go colorless? • Yes....through algorithmic subsampling…

  28. Algorithm: Embellish GPGE(K,N,M) • Assign sequence, s, to an instanceeg. s = K, (2K), (2K-7), (2K-15), ………. • Construct M sequences si • Each si is monotonically decreasing; • si[j] no bigger than s[j] • Associate each si with a segment and each element si[j] = k to k randomly selected vertices at depth j

  29. Algorithm: Constructing seqs…

  30. “Topological” Defn of LCAAin GPGE(K,N,M) • Input: GPGE(K,N,M) with GE embellishment • LCAA • CA in all M subgraphs (trees) • Least such CA

  31. Different Models as Subgraphs Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • MixedSubgraphGPGE(K,N,M) LCAAh GMRCA LCA h TMRCA LCAAh GMRCA

  32. Probability of Instances with Unique LCA/LCAA Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • Mixed Subgraph GPGE(K,N,M)

  33. “Topological” Defns of LCAA GMRCAhLCAAlLCA & lone pair TMRCA h LCA GMRCA hLCAAlLCA & lone node Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • MixedSubgraphGPGE(K,N,M)

  34. Expected Depth E(D) of LCA/LCAA Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • Mixed Subgraph GPGE(K,N,M) O(N2) O(K) O(KM)

  35. RECONSTRUCTABILITY Pedigree Graph GPG(K,N) • Red Subgraph GPTX(K,N)Blue Subgraph GPTY(K,N) • Mixed Subgraph GPGE(K,N,M) O(N2) O(K) O(KM)

  36. Summary:History Reconstruction? • Mixed Subgraph models recombinations Only fragments of the chromosome • In reality, only a minimal structure (HUD) of the GPGE(K,N,M)or ARG can be estimated • Forbidden structures ….

  37. RoadMap • Motivation • Reconstructability (Random Graph Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008 L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009

  38. INPUT: Chromosomes (haplotypes) OUTPUT: Recombinational Landscape (Recotypes)

  39. Our Approach Granularity g statistical NO Acceptable p-value? YES combinatorial IRiS statistical Analyze Results M Mele, A Javed, F Calafell, L Parida, J Bertranpetit and Genographic Consortium Recombination-based genomics: a genetic variation analysis in human populations,under submission.

  40. Preprocess: Dimension reduction via Clustering 11 12 13 14 15 16 0 17 1 18 4 19 65 20 8 21 9107 22 23 32 24

  41. Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical

  42. p-value Estimation

  43. Comparison of the Randomization Schemes

  44. SNP Blocks (granularity g=3)

  45. Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical

  46. IRiS(IdentifyingRecombinationsinSequences) Stage Haplotypes: use SNP block patterns biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008

  47. Segmentation 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234511111111111111111111111111111111111111112222222222222222222222222222222222233333333344444444455555555555555----

  48. Segmentation

  49. Consensus of Trees

More Related