Comparative computational biology

Comparative computational biology • Positive selection

What is positive selection?

Positive selection is selection of a particular trait • - and the increased frequency of an allele in a population

Intraspecific level Positive selection driving continuous adaptation to changes in the environment Species B Species C Recurrent adaptation Species A

Allele frequencies in host-pathogen interactions Positive selection and recurrent selective sweeps Fixation of alleles Host Pathogen Positive selection and dynamics of polymorphisms Woolhouse et al, 2002. Nat. Genet

Interspecific level Positive selection driving divergence Diversifying positive selection Species B Species C Species A

Why is it interesting to identify traits which • have undergone or are under positive selection? Function Evolution Environment ……

How can we detect positive selection? • by comparing homologous sequences from • different individuals and identify an unexpectedly high proportion of non-neutral mutational changes

Positive selection acts on beneficial mutations

…… which give rise to changes in the amino acid • sequences of proteins

To measure positive selection: Rate of synonymous mutations dS or KS Rate of non-synonymous mutations dNor KA

1 2 3 4 5 Pro PheGlyLeuPhe Seq 1 CCC UUU GGG UUA UUU Seq 2 CCC UUC GAG CUA GUA Pro PheAlaLeu Val How many possible non-synonymous and synonymous mutations??

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S): A non-degenerate site (all possible nucleotides will result in an amino acid change): Counted as S=0 (N=1) CCU CCC CCA CCG Proline A fourfold degenerate site (all possible nucleotides can be tolerated without an an amino acid change : Counted as S=1 (N=0) Possible synonymous changes for proline: S = 0 + 0 + 1 = 1

We need to know the degeneracy of each codons to compute the number of possible synonymous mutations in codons (S): A non-degenerate site (all possible nucleotides will result in an amino acid change): Counted as S=0 (N=1) UUU UUC Phenylalanine A two-fold degenerate site (two possible nucleotides can be tolerated without an an amino acid change: Counted as S=1/3 (N=2/3) Possible synonymous changes for phenylalanine S = 0 + 0 + 1/3 = 1/3

Counts of possiblesynonymous sites for each gene (S) 1 2 3 4 5 Pro PheGlyLeuPhe Seq 1 CCC UUU GGG UUA UUU Seq 2 CCC UUC GAG CUA GUA Pro PheAlaLeu Val 1. ProlineS = 0 + 0 + 1 = 1 2. Phenylalanine S = 0 + 0 + 1/3 = 1/3 3. For Glycine S = 0 + 0 + 1 = 1, 4. Leucinefor UUA, S = 1/3 + 0 + 1/3 = 2/3 for CUA, S = 1/3 + 0 + 1 = 4/3 Take the average of these: S = 1 5. Phenylalanine for UUU, S = 1/3 Valine, S = 1 Take average: S = 2/3 For the whole sequence: S = 1 + 1/3 + 1 + 1 + 2/3 = 4 N = total number of sites: 15 - 4 = 11

Counts of synonymous and non-synonymouschangesfor each gene (Sd and Nd) 1 2 3 4 5 Pro PheGlyLeuPhe Seq 1 CCC UUU GGG UUA UUU Seq 2 CCC UUC GAG CUA GUA Pro PheAlaLeu Val Calculate Sdand Nd for each codon. 1. Sd= 0, Nd = 0 2. Sd = 1, Nd = 0 3. Sd = 0, Nd = 1 4. Sd = 1, Nd = 0 5. thiscould happen in twoways UUU --> GUU --> GUA Nd= 1 Sd = 1 Route 1: Sd= 1, Nd= 1 UUU --> UUA --> GUA Nd= 1 Nd = 1 Route 2: Sd= 0, Nd= 2 Average: Sd= 0.5, Nd = 1.5 Total Sd = 2.5, total Nd = 2.5

Finally, wecancompute the rates of syn and non-syn changes 1 2 3 4 5 Pro PheGlyLeuPhe Seq 1 CCC UUU GGG UUA UUU Seq 2 CCC UUC GAG CUA GUA Pro PheAlaLeu Val Possible synonymous sites: 4 Possible non-synonymous sites: 11 Synonymous changes: 2.5 Non-synonymous changes: 2.5 Synonymous rate: KS=Sd/ S = 2.5/4 = 0.625 Non-synonymous rate: KA= Nd/ N = 2.5/11 = 0.227

Evaluating the effect of positive selection by computing the RATIO of syn and non-syn changes Kaor dN Positive selection Neutral evolution f.ex. immune relatedgenes f.ex. pseudogenes KA>KS KA=KS KA<KS Purifying selection f.ex. housekeeping genes Ks or dS KA or dN: rate of non-synonymousdivergence KS or dS: rate of synonymousdivergence

Finally, wecancompute the rates of syn and non-syn changes 1 2 3 4 5 Pro PheGlyLeuPhe Seq 1 CCC UUU GGG UUA UUU Seq 2 CCC UUC GAG CUA GUA Pro PheAlaLeu Val Possible synonymous sites: 4 Possible non-synonymous sites: 11 Synonymous changes: 2.5 Non-synonymous changes: 2.5 Synonymous rate: KS=Sd/ S = 2.5/4 = 0.625 Non-synonymous rate: KA= Nd/ N = 2.5/11 = 0.227 Ka/Ks = 0.36

Positive selection affects gene evolution during species divergence and during evolution of a population Within species: Non-synonymousproportion of polymorphisms: PN= Nd/ N Synonymous proportion of polymorphisms: PS = Sd / S Ka/Ks PN / Ps Species A Species B Species C

Positive selection affects gene evolution during species divergence and during evolution of a population Wecanuse the two ratios Ka/Ks and Pn/Ps to inferwhenselection has acted (or is acting) Past selection (during speciation) Ka/Ks PN / Ps Present day selection (adaptation) Species A Species B Species C

McDonald Kreitman (MK) test to contrast within and between species variation

Question: Are adaptive mutations in the alcohol dehydrogenase in Drosophila species a result of species divergence or current positive selection?

Drosophila dataset alcohol dehydrogenase Repl: Nonsynonymous, Syn: Synonymous Fixed: Substitution, Poly: Polymorphisms

MK test contrasts within and between species synonymous and non-synonymous differences The proportion of non-synonymous fixed differences between species much higher than the proportion of non-synonymous polymorphisms Contingency table can be tested by a G-test

Conclusion from MK-test: Adh locus in Drosophila has accumulated adaptive mutations (been under positive selection) when the Drosophila species diverged

One problem with the “counting methods” Sometimes the signal of selection is not very strong

Positive selection on one or fewparticularcodons or in oneparticularbranch Evolutionary model to detect selection in particular codons or branches

Comparative computational biology

Comparative computational biology

Presentation Transcript

Computational Biology

Computational biology and computational biologists

Computational Structural Biology

Computational Biology

Computational Biology

Computational Molecular Biology

Computational Biology

Computational Biology

Computational Biology

Computational Molecular Biology

Computational Biology

Computational Molecular Biology

Computational Biology

Computational Molecular Biology

Computational Molecular Biology

Computational Systems Biology

Computational Biology

Computational Biology Group

Computational Molecular Biology

Comparative Biology

Computational Biology

Computational Biology