1 / 9

Incorporating Mutations

Incorporating Mutations. Previous we allowed for gene variants (alleles), but without a model of how they came into being Rather than the coalescence of a single gene, next we consider successive generations of gene sets Two things to consider Variants of a gene (Alleles)

aldis
Download Presentation

Incorporating Mutations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incorporating Mutations • Previous we allowed for gene variants (alleles), but without a model of how they came into being • Rather than the coalescence of a single gene, next we consider successive generations of gene sets • Two things to consider • Variants of a gene (Alleles) • Variants in allele combinations (Sequences) • We begin by treating each independently Gn Gn Gn Gn Gn Gn Gn Gn+1 Gn+2 Gn+3 Gn+4 Comp 790– Genealogies to Sequences

  2. Infinite Alleles Model • Assumes all that is knowable is if alleles are identical or different • No Spatial (i.e. sequence position)or quantitative informationrelated to the observeddifferences • Only keeps track of how many of each allele type • Number of mutations that result in a variant is lost • Two event types, splits and mutations • Labels are arbitrary (A) (A,A) (B)(A) (B)(A) (B)(A,A) (B)(A)(C) (B)(A)(C,C) (B,B)(A)(C,C) (B)(D)(A)(C,C) (B)(D)(A)(C,C) B D A C C Comp 790– Genealogies to Sequences

  3. Infinite Sites Model • Assumes mutations are rare events • Assumes DNA sequences are large • Multiple mutations atthe same site areextremely rare • Infinite Sites Modelassumes that multiplemutations never occurat the same sequenceposition • Thus, all genes are “Biallelic” -0-0-0-0-0- Lost haplotype -1-0-0-0-0- -1-1-0-0-0- -0-0-1-0-0- -1-1-0-1-0- -1-1-0-0-0- -1-1-0-1-0- -0-0-0-0-1- -0-0-1-0-0- -0-0-1-0-0- Comp 790– Genealogies to Sequences

  4. SNP Panels • Observed Haplotypes and SNPs from previous example • Under the Infinite Sites Model the haplotype size equals number of historical mutations • While sequences can be lost,alleles cannot, in contrast tothe Infinite Alleles Model • SNP Diversity Patterns (SDPs)can be repeated (eg. S1 and S2) • Since the assignment of 1s and 0s is arbitrary, a SNP and its complement share the same SDP • For N haplotypes, there are at most 2N-1 – 1 “possible” SDPs Comp 790– Genealogies to Sequences

  5. A Different Kind of Tree • Unrooted “Perfect” Phylogeny • Nodes correspond to haplotypes(both visible and historical) • Edges correspond to SNPs • Removal of an edge createsa bipartition • Tree leaves correspond tomutations (allele variants)that are unique to a sequence,i.e. an SDP with only oneminority allele instance, a singleton -0-0-1-0-0- -0-0-0-0-0- -1-0-0-0-0- -0-0-0-0-1- -1-1-0-0-0- -1-1-0-1-0- Comp 790– Genealogies to Sequences

  6. Build a Phylogenetic Tree • Assume we only have direct access to observed haplotypes • Construct a pair-wise distance matrix between haplotypes using Hamming distances • Add smallest edge between all nodes which do not introduce a loop • If the smallest distance is greater than 1 add d-1“hidden” nodes between the pair so that adjacent nodes have a hamming distance of 1 • Augment the distance matrix with the new nodes and claim the introduced edges • Repeat finding the smallest distance, and augmenting until the graph is fully connected -0-0-1-0-0- -1-1-0-0-0- -1-0-0-0-0- -0-0-0-0-0- -1-1-0-1-0- -0-0-0-0-1- Comp 790– Genealogies to Sequences

  7. Four-Gamete Test • Under the assumption of the infinite sites model all SNP pairs exhibit the property no more that 3 out of the possible 4 allele combinations occur • Direct consequence of only one mutation per site • Showing that all SNP pair combinations satisfy the four gamete test is a necessary and sufficient condition for there to exist a perfect phylogeny tree Comp 790– Genealogies to Sequences

  8. Hard Questions • Which SDPs are compatible with any other SNP? • Given N distinct haplotype sequences resulting from an infinite sites model what is minimum number of SDPs? • Given N distinct haplotype sequences resulting from an infinite sites model what is maximum number of SDPs? Singleton SNPs are compatible are compatible with any other SNP N-1 edges are the fewest necessary to connect N haplotypes into a “linear” tree.How many singleton SNPs occur in such a tree? 2 2N-3 edges, the number of edges in an unrooted tree with N leaves Comp 790– Genealogies to Sequences

  9. Exercise • Consider the following SNP panel • Satisfies the four gamete test? • Construct the tree • Is the SDP 11001T possible? Comp 790– Continuous-Time Coalescence

More Related