1 / 45

Unraveling Human Ancestry: Bridging Genetics with Computation

Explore the Genographic Project’s journey to map human origins through DNA sequencing and phylogeographic analysis, employing recombination algorithms for comprehensive insights. Challenges and advancements in understanding complex genetic reticulations are discussed. Geographical and computational perspectives converge in reconstructing human ancestry using innovative genetic variation analysis.

clabrecque
Download Presentation

Unraveling Human Ancestry: Bridging Genetics with Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RECOMBINOMICS:Myth or Reality? Laxmi Parida IBM Watson Research New York, USA

  2. RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion

  3. www.nationalgeographic.com/genographic

  4. www.ibm.com/genographic

  5. Five year study, launched in April 2005 to address anthropological questions on a global scale using genetics as a tool • Although fossil records fix human origins in Africa, little is known about the great journey that took Homo sapiens to the far reaches of the earth. How did we, each of us, end up where we are? • Samples all around the world are being collected and the mtDNA and Y-chromosome are being sequenced and analyzed phylogeographic question

  6. DNA material in use under unilinear transmission 16000 bp 58 mill bp 0.38%

  7. Missing information in unilinear transmissions past present

  8. Paradigm Shift in Locus & Analysis Using recombining DNA sequences • Why? • Nonrecombining gives a partial story • represents only a small part of the genome • behaves as a single locus • unilinear (exclusively male of female) transmission • Recombining towards more complete information • Challenges • Computationally very complex • How to comprehend complex reticulations?

  9. RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm (DSR Algorithm) • Conclusion L Parida, Pedigree History: A Reconstructability Perspective using Random-Graphs Framework, Under preparation.

  10. RoadMap • Motivation • Reconstructability (Random Graph Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008 L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009

  11. INPUT: Chromosomes (haplotypes) OUTPUT: Recombinational Landscape (Recotypes)

  12. Our Approach Granularity g statistical NO Acceptable p-value? YES combinatorial IRiS statistical Analyze Results M Mele, A Javed, F Calafell, L Parida, J Bertranpetit and Genographic Consortium Recombination-based genomics: a genetic variation analysis in human populations,under submission.

  13. Preprocess: Dimension reduction via Clustering 11 12 13 14 15 16 0 17 1 18 4 19 65 20 8 21 9107 22 23 32 24

  14. Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical

  15. p-value Estimation

  16. Comparison of the Randomization Schemes

  17. SNP Blocks (granularity g=3)

  18. Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical

  19. IRiS(IdentifyingRecombinationsinSequences) Stage Haplotypes: use SNP block patterns biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008

  20. Segmentation 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234511111111111111111111111111111111111111112222222222222222222222222222222222233333333344444444455555555555555----

  21. Segmentation

  22. Consensus of Trees

  23. Algorithm Design • Ensure compatibility of component trees • Parsimony model: minimize the no. of recombinations

  24. Algorithm Design • Ensure compatibility of component trees • Parsimony model: minimize the no. of recombinations Theorem: The problem is NP-Hard. “It is impossible to design an algorithm that guarantees optimality.”

  25. DSR Scheme(Dominant—Subdominant---Recombinant)

  26. DSR Scheme: Level 1

  27. DSR Assignment Rules • At most one D per row and column; if no D, at most one S per row and column • At most one non-R in the row and column, but not both

  28. DSR Assignment Rules • Each row and each column has at most one D ELSE has at most one S • A non-R can have other non-Rs either in its row or its column but NOT both

  29. DSR Scheme: Level 1

  30. DSR Scheme: Level 2

  31. DSR Scheme: Level 2

  32. DSR Scheme: Level 3

  33. DSR Scheme: Level 3

  34. DSR Scheme: Level 4

  35. DSR Scheme: Level 5

  36. Mathematical Analysis: Approximation Factor • Greedy DSR Scheme • Z and Y are computable functions of the input L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009

  37. Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical

  38. IRiS Output: RECOTYPE Recombination vectors R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 ………. s1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 ………. s2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 ………. . . . .

  39. Quick Sanity Check:Ultrametric Network on RECOTYPES

  40. IRiS(IdentifyingRecombinationsinSequences) Stage Haplotypes: use SNP block patterns IRiS software will be released by the end of summer ’09 Asif Javed biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008

  41. What’s in a name? RECOMBIN-OMICS Jaume Bertranpetit RECOMBIN-OMETRICS Robert Elston • Allele-frequency variations between populations is also reflected in the purely recombination-based variations • Detects subcontinental divide from short segments • based on populations level analysis • Detects populations from short segments • based on recombination events analysis

  42. Allele-frequency variations between populations is also reflected in the purely recombination-based variations • Detects subcontinental divide from short segments • based on populations level analysis • Detects populations from short segments • based on recombination events analysis Are we ready for the OMICS / OMETRICS? o population-specific signals ?o other critical signals ? o anything we didn’t already know?

  43. Thank you!!

More Related