260 likes | 275 Views
Recombination and Pedigrees. Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs Challenges Empirical Investigations: Open Questions. Finding Minimal Recombination Histories. 1. 2. 3. 4. 1. 2. 3. 1. 4. 2. 3.
E N D
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs Challenges Empirical Investigations: Open Questions
Finding Minimal Recombination Histories 1 2 3 4 1 2 3 1 4 2 3 4 Global Pedigrees Finding Common Ancestors NOW Recombination Histories & Global Pedigrees Acknowledgements Yun Song - Rune Lyngsø - Mike Steel
Hudson & Kaplan’s RM 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 If you equate RM with expected number of recombinations, this could be used as an estimator. Unfortunately, RM is a gross underestimate of the real number of recombinations.
Local Inference of Recombinations • Recoding • At most 1 mutation per column • 0 ancestral state, 1 derived state 0 0 1 1 0 1 0 1 T . . . G T . . . C A . . . G A . . . C Four combinations Incompatibility: 0 0 0 1 1 0 1 0 0 0 0 1 1 1 00 10 01 11 Myers-Griffiths (2002): Number of Recombinations in a sample, NR, number of types, NT, number of mutations, NM obeys:
Finding Minimal Recombination Histories 64 Bodmer & Edwards: Parsimony defined as reconstruction principle 85 Hudson Kaplan uses minimal recombination histories as observed recombinations • Attempts to find minimal histories of sequences • Definition of recombination as Subtree Prune Regraft operations J.J.Hein: Reconstructing the history of sequences subject to Gene Conversion and Recombination. Mathematical Biosciences. (1990) 98.185-200. J.J.Hein: A Heuristic Method to Reconstruct the History of Sequences Subject to Recombination. J.Mol.Evol. 20.402-411. 1993 Hein,J.J., T.Jiang, L.Wang & K.Zhang (1996): "On the complexity of comparing evolutionary trees" Discrete Applied Mathematics 71.153-169 Song, Y.S. (2003) “On the combinatorics of rooted binary phylogenetic trees”. Annals of Combinatorics, 7:365–379 Song, Y.S. & Hein, J. (2005) Constructing Minimal Ancestral Recombination Graphs. J. Comp. Biol., 12:147–169 Song, Y.S. & Hein, J. (2004) On the minimum number of recombination events in the evolutionary history of DNA sequences.J. Math. Biol., 48:160–186. Song, Y.S. & Hein, J. (2003) Parsimonious reconstruction of sequence evolution and haplotype blocks: finding the minimum number of recombination events, Lecture Notes in Bioinformatics, Proceedings of WABI'03, 2812:287–302. Lyngsø, Song and Hein (2005) “Minimal Recombination Histories by Branch and Bound” WABI
Minimal Number of Recombinations Last Local Tree Algorithm: 1 2 i-1 i L Data 1 2 Trees n How many local trees? • Unrooted • Coalescent The Kreitman data (1983):11 sequences, 3200bp, 43(28) recoded, 9 different Bi-partitions How many neighbors?
Two Adjacent Columns 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 DiamRec 1 2 RecDist 1. RecDist[T1,T2] is hard for large leaf number, but can be automatically calculated by adding DiamRec trivial columns and only considering 1 recombination neighbors. 2. Infinite Site Assumption: Local Trees must contain Local Bipartition
Metrics on Trees based on subtree transfers. Trees including branch lengths Unrooted tree topologies Rooted tree topologies Tree topologies with age ordered internal nodes Pretending the easy problem (unrooted) is the real problem (age ordered), causes violation of the triangle inequality:
Tree Combinatorics and Neighborhoods Due to Yun Song Song (2003+) Allen & Steel (2001) Observe that the size of the unit-neighbourhood of a tree does not grow nearly as fast as the number of trees
1 4 2 3 5 6 7
The Griffiths-Ethier-Tavare Recursions No recombination: Infinite Site Assumption Ancestral State Known History Graph: Recursions Exists No cycles Possible Histories without Recombination for simple data example 0 1 1 1 4 2 3 5 4 5 5 5 6 3 7 2 8 1 - recombination 27 ACs + recombination 3*108 ACs
mid-point heuristic 2nd 1st Ancestral configurations to 2 sequences with 2 segregating sites
Counting Recursion k1 (k2+1)*k1 +1 possible ancestral columns. k2 Summary statistic lumping configurations k1(k2+1)+1 padded with “-” + 1 k+1 k
Counting + Branch and Bound Algorithm 0 3 1 91 2 1314 3 8618 4 30436 5 62794 6 78970 7 63049 8 32451 9 10467 10 1727 Lower bound ? Upper Bound Exact length k 289920 k-recombinatination neighborhood
i. The process is non-Markovian ii. The trees cannot be reduced to Topologies * * = Time versus Spatial: Coalescent-Recombination (Griffiths, 1981; Hudson, 1983 - Wiuf & Hein, 1999) Temporal Process Spatial Process
Time versus Spatial 2: Pedigrees Elston-Stewart (1971) -Temporal Peeling Algorithm: Father Mother Condition on parental states Recombination and mutation are Markovian Lander-Green (1987) - Genotype Scanning Algorithm: Father Mother Condition on paternal/maternal inheritance Recombination and mutation are Markovian
Time versus Spatial 3: Phylogenetic Alignment • Optimisation Algorithms • indels of length 1 (David Sankoff, 1973) Spatial • indels of length k (Bjarne Knudsen, 2003) Temporal • Statistical Alignment Temporal: Spatial:
minARGs: Recombination Events & Local Trees Song-Hein Myers-Griiths ((1,2),(1,2,3)) Hudson-Kaplan Minimal ARG n=8, Q=40 True ARG 1 2 3 4 5 n=8, Q=15 True ARG Reconstructed ARG 1 3 2 4 5 ((1,3),(1,2,3)) 0 4 Mb Mutation information on both sides • Mutation information on only one side n=7, r=10, Q=75
Likelihood Calculations on the e-ARG 010 010 101 101 110 Example:
Reconstructing global pedigrees: Superpedigrees Steel and Hein, 2005 k The gender-labeled pedigrees for all pairs, defines global pedigree Gender-unlabeled pedigrees doesn’t!!
Reconstructing global pedigrees: Links and lassos Steel and Hein, 2005 k Link Lasso Gender-labeled links and lassos determine the global pedigree.
Benevolent Mutation and Recombination Process Genomes with r and m/r --> infinity r - recombination rate, m - mutation rate Embedded phylogenies: • All embedded phylogenies are observable • Do they determine the pedigree? Counter example:
Summary and Future • Minimal Recombination Histories • Likelihood Calculations • Global Pedigrees & Inferring Pedigrees from Genomes Recombination: Remove infinite site assumption Investigate MCMC algorithms Pedigrees: Data Analysis Algorithms Presentation can be found at: http://mathgen.stats.ox.ac.uk/bioinformatics/
To Do • Hudson Slide • Neighbor trees • Literature and History