90 likes | 235 Views
Incorporating Mutations. Previous we allowed for gene variants (alleles), but without a model of how they came into being Rather than the coalescence of a single gene, next we consider successive generations of gene sets Two things to consider Variants of a gene (Alleles)
E N D
Incorporating Mutations • Previous we allowed for gene variants (alleles), but without a model of how they came into being • Rather than the coalescence of a single gene, next we consider successive generations of gene sets • Two things to consider • Variants of a gene (Alleles) • Variants in allele combinations (Sequences) • We begin by treating each independently Gn Gn Gn Gn Gn Gn Gn Gn+1 Gn+2 Gn+3 Gn+4 Comp 790– Genealogies to Sequences
Infinite Alleles Model • Assumes all that is knowable is if alleles are identical or different • No Spatial (i.e. sequence position)or quantitative informationrelated to the observeddifferences • Only keeps track of how many of each allele type • Number of mutations that result in a variant is lost • Two event types, splits and mutations • Labels are arbitrary (A) (A,A) (B)(A) (B)(A) (B)(A,A) (B)(A)(C) (B)(A)(C,C) (B,B)(A)(C,C) (B)(D)(A)(C,C) (B)(D)(A)(C,C) B D A C C Comp 790– Genealogies to Sequences
Infinite Sites Model • Assumes mutations are rare events • Assumes DNA sequences are large • Multiple mutations atthe same site areextremely rare • Infinite Sites Modelassumes that multiplemutations never occurat the same sequenceposition • Thus, all genes are “Biallelic” -0-0-0-0-0- Lost haplotype -1-0-0-0-0- -1-1-0-0-0- -0-0-1-0-0- -1-1-0-1-0- -1-1-0-0-0- -1-1-0-1-0- -0-0-0-0-1- -0-0-1-0-0- -0-0-1-0-0- Comp 790– Genealogies to Sequences
SNP Panels • Observed Haplotypes and SNPs from previous example • Under the Infinite Sites Model the haplotype size equals number of historical mutations • While sequences can be lost,alleles cannot, in contrast tothe Infinite Alleles Model • SNP Diversity Patterns (SDPs)can be repeated (eg. S1 and S2) • Since the assignment of 1s and 0s is arbitrary, a SNP and its complement share the same SDP • For N haplotypes, there are at most 2N-1 – 1 “possible” SDPs Comp 790– Genealogies to Sequences
A Different Kind of Tree • Unrooted “Perfect” Phylogeny • Nodes correspond to haplotypes(both visible and historical) • Edges correspond to SNPs • Removal of an edge createsa bipartition • Tree leaves correspond tomutations (allele variants)that are unique to a sequence,i.e. an SDP with only oneminority allele instance, a singleton -0-0-1-0-0- -0-0-0-0-0- -1-0-0-0-0- -0-0-0-0-1- -1-1-0-0-0- -1-1-0-1-0- Comp 790– Genealogies to Sequences
Build a Phylogenetic Tree • Assume we only have direct access to observed haplotypes • Construct a pair-wise distance matrix between haplotypes using Hamming distances • Add smallest edge between all nodes which do not introduce a loop • If the smallest distance is greater than 1 add d-1“hidden” nodes between the pair so that adjacent nodes have a hamming distance of 1 • Augment the distance matrix with the new nodes and claim the introduced edges • Repeat finding the smallest distance, and augmenting until the graph is fully connected -0-0-1-0-0- -1-1-0-0-0- -1-0-0-0-0- -0-0-0-0-0- -1-1-0-1-0- -0-0-0-0-1- Comp 790– Genealogies to Sequences
Four-Gamete Test • Under the assumption of the infinite sites model all SNP pairs exhibit the property no more that 3 out of the possible 4 allele combinations occur • Direct consequence of only one mutation per site • Showing that all SNP pair combinations satisfy the four gamete test is a necessary and sufficient condition for there to exist a perfect phylogeny tree Comp 790– Genealogies to Sequences
Hard Questions • Which SDPs are compatible with any other SNP? • Given N distinct haplotype sequences resulting from an infinite sites model what is minimum number of SDPs? • Given N distinct haplotype sequences resulting from an infinite sites model what is maximum number of SDPs? Singleton SNPs are compatible are compatible with any other SNP N-1 edges are the fewest necessary to connect N haplotypes into a “linear” tree.How many singleton SNPs occur in such a tree? 2 2N-3 edges, the number of edges in an unrooted tree with N leaves Comp 790– Genealogies to Sequences
Exercise • Consider the following SNP panel • Satisfies the four gamete test? • Construct the tree • Is the SDP 11001T possible? Comp 790– Continuous-Time Coalescence