450 likes | 473 Views
RECOMBINOMICS : Myth or Reality?. Laxmi Parida IBM Watson Research New York, USA. RoadMap. Motivation Reconstructability (Random Graphs Framework) Reconstruction Algorithm (DSR Algorithm) Conclusion. www.nationalgeographic.com/genographic. www.ibm.com/genographic.
E N D
RECOMBINOMICS:Myth or Reality? Laxmi Parida IBM Watson Research New York, USA
RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion
Five year study, launched in April 2005 to address anthropological questions on a global scale using genetics as a tool • Although fossil records fix human origins in Africa, little is known about the great journey that took Homo sapiens to the far reaches of the earth. How did we, each of us, end up where we are? • Samples all around the world are being collected and the mtDNA and Y-chromosome are being sequenced and analyzed phylogeographic question
DNA material in use under unilinear transmission 16000 bp 58 mill bp 0.38%
Missing information in unilinear transmissions past present
Paradigm Shift in Locus & Analysis Using recombining DNA sequences • Why? • Nonrecombining gives a partial story • represents only a small part of the genome • behaves as a single locus • unilinear (exclusively male of female) transmission • Recombining towards more complete information • Challenges • Computationally very complex • How to comprehend complex reticulations?
RoadMap • Motivation • Reconstructability (Random Graphs Framework) • Reconstruction Algorithm (DSR Algorithm) • Conclusion L Parida, Pedigree History: A Reconstructability Perspective using Random-Graphs Framework, Under preparation.
RoadMap • Motivation • Reconstructability (Random Graph Framework) • Reconstruction Algorithm(DSR Algorithm) • Conclusion L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008 L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009
INPUT: Chromosomes (haplotypes) OUTPUT: Recombinational Landscape (Recotypes)
Our Approach Granularity g statistical NO Acceptable p-value? YES combinatorial IRiS statistical Analyze Results M Mele, A Javed, F Calafell, L Parida, J Bertranpetit and Genographic Consortium Recombination-based genomics: a genetic variation analysis in human populations,under submission.
Preprocess: Dimension reduction via Clustering 11 12 13 14 15 16 0 17 1 18 4 19 65 20 8 21 9107 22 23 32 24
Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical
Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical
IRiS(IdentifyingRecombinationsinSequences) Stage Haplotypes: use SNP block patterns biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008
Segmentation 1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234511111111111111111111111111111111111111112222222222222222222222222222222222233333333344444444455555555555555----
Algorithm Design • Ensure compatibility of component trees • Parsimony model: minimize the no. of recombinations
Algorithm Design • Ensure compatibility of component trees • Parsimony model: minimize the no. of recombinations Theorem: The problem is NP-Hard. “It is impossible to design an algorithm that guarantees optimality.”
DSR Assignment Rules • At most one D per row and column; if no D, at most one S per row and column • At most one non-R in the row and column, but not both
DSR Assignment Rules • Each row and each column has at most one D ELSE has at most one S • A non-R can have other non-Rs either in its row or its column but NOT both
Mathematical Analysis: Approximation Factor • Greedy DSR Scheme • Z and Y are computable functions of the input L Parida, A Javed, M Mele, F Calafell, J Bertranpetit and Genographic Consortium, Minimizing Recombinations in Consensus Networks for Phylogeographic Studies, BMC Bioinformatics 2009
Analysis Flow Granularity g NO statistical Acceptable p-value? YES IRiS combinatorial Analyze Results statistical
IRiS Output: RECOTYPE Recombination vectors R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 ………. s1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 ………. s2 0 1 0 1 1 1 0 1 0 0 1 0 0 0 ………. . . . .
IRiS(IdentifyingRecombinationsinSequences) Stage Haplotypes: use SNP block patterns IRiS software will be released by the end of summer ’09 Asif Javed biological insights Segment along the length: infer trees computational insights Infer network (ARG) L Parida, M Mele, F Calafell, J Bertranpetit and Genographic Consortium Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns Journal of Computational Biology, vol 15(9), pp 1—22, 2008
What’s in a name? RECOMBIN-OMICS Jaume Bertranpetit RECOMBIN-OMETRICS Robert Elston • Allele-frequency variations between populations is also reflected in the purely recombination-based variations • Detects subcontinental divide from short segments • based on populations level analysis • Detects populations from short segments • based on recombination events analysis
Allele-frequency variations between populations is also reflected in the purely recombination-based variations • Detects subcontinental divide from short segments • based on populations level analysis • Detects populations from short segments • based on recombination events analysis Are we ready for the OMICS / OMETRICS? o population-specific signals ?o other critical signals ? o anything we didn’t already know?