1 / 60

Research on Mitochondrial Genomes Lectures for 4Y03

Research on Mitochondrial Genomes Lectures for 4Y03. Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs and BBSRC. Building a database for mitochondrial genomes. Large scale - gene order evolution.

rhys
Download Presentation

Research on Mitochondrial Genomes Lectures for 4Y03

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research on Mitochondrial GenomesLectures for 4Y03 Paul Higgs Dept. of Physics, McMaster University, Hamilton, Ontario. Supported by Canada Research Chairs and BBSRC

  2. Building a database for mitochondrial genomes. • Large scale - gene order evolution. • Medium scale – sequence evolution. Molecular phylogenetics. • Small scale – mutation and selection. Variation in base and amino acid frequencies. Codon usage. • Genetic code evolution People: 1. Wenli Jia, Bin Tang, Daniel Jameson 2. Howsun Jow, Magnus Rattray, Cendrine Hudelot, Vivek Gowri-Shankar, Xiaoguang Yang 3. Wei Xu, Daniel Jameson 4. Daniel Urbina, Wenli Jia. 5. Supratim Sengupta

  3. Mitochondria are organelles inside eukaryotic cells. They are the site of oxidative phosphorylation and ATP synthesis. They contain their own genome distinct from the DNA in the nucleus. Typical animal mitochondrial genomes are short and circular (~16,000 bases). They usually contain: 2 rRNAs 22 tRNAs 13 proteins

  4. LOCUS NC_001922 16646 bp DNA circular VRT 20-SEP-2002 DEFINITION Alligator mississippiensis mitochondrion, complete genome. ACCESSION NC_001922 VERSION NC_001922.1 GI:5835540 KEYWORDS . SOURCE mitochondrion Alligator mississippiensis (American alligator) ORGANISM Alligator mississippiensis Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Archosauria; Crocodylidae; Alligatorinae; Alligator. REFERENCE 1 (bases 1 to 16646) AUTHORS Janke,A. and Arnason,U. TITLE The complete mitochondrial genome of Alligator mississippiensis and the separation between recent archosauria (birds and crocodiles) JOURNAL Mol. Biol. Evol. 14 (12), 1266-1272 (1997) MEDLINE 98066357 PUBMED 9402737 FEATURES Location/Qualifiers source 1..16646 /organism="Alligator mississippiensis" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:8496" /tissue_type="liver" /dev_stage="adult" rRNA 1..976 /product="12S ribosomal RNA" tRNA 977..1044 /product="tRNA-Val" /anticodon=(pos:1009..1011,aa:Val) rRNA 1046..2635 /product="16S ribosomal RNA" tRNA 2636..2710 /product="tRNA-Leu" /note="codons recognized: UUR" /anticodon=(pos:2672..2674,aa:Leu) gene 2711..3676 /gene="ND1" CDS 2711..3676 /gene="ND1" An example of a GenBank file Complete mitochondrial genome of the Alligator 1 caacagactt agtcctggtc ttttcattag ctagtactca acttatacat gcaagcatcc 61 gcgaaccagt gagaacaccc tacaagtctg acagacgaat ggagccggca tcaggcacat 121 caaccgatag cccaaaacgc ctagcccagc cacaccccca agggtctcag cagtgattaa 181 ccttaaacca taagcgaaag cttgatttag ttagagtaga tatagaggcg gtcaactctc 241 gtgccagcaa ccgcggttag acgaaaacct caagttaatt gacaaacggc gtaaattgtg 301 gctagaactc tatctccccc attagtgcag atacggtatc acagtagtga taaacttcat 361 cacaccgcaa acatcaacac aaaactggcc ctaatctcaa agatgtactc gattccacga 421 aagctgagaa acaaactggg attagatacc ccactatgct cagcccttaa cattggtgta 481 gtacacaaca gactaccctc gccagagaat tacgagcccc gcttaaaact caaaggactt 541 gacggcactt taaacccccc tagaggagcc tgtcctataa tcgacagtac acgttacacc 601 cgaccacctt tagcctactc agtctgtata ccgccgtcgc aagcccgtcc catttgaggg 661 aaacaaaacg cgcgcaacag ctcaaccgag ctaacacgtc aggtcaaggt gcagccaaca 721 aggtggaaga gatgggctac attttctcaa catgtagaaa tattcaacgg agagccctat 781 gaaatacagg actgtcaaag ccggatttag cagtaaactg ggaaagaata cctagttgaa 841 gtcggtaacg aagtgcgtac acaccgcccg tcaccctcct cgaacccaac aaaatgccca 901 aacaacaggc acaatgttgg gcaagatggg gaaagtcgta acaaggtaag cgtaccggaa 961 ggtgcacttg gaacatcaaa atgtagctta aatttaaagc attcagttta cacctgaaaa 1021 agtcccacca tcggaccatt ttgaaaccca tatctagccc tacctccttt caacatgctt

  5. OGRe (= Organellar Genome Retrieval) is a relational database. available at http://ogre.mcmaster.ca More than 800 complete animal mitochondrial genomes. Efficient means of storage and retrieval of information. Uses PostgreSQL Schema defines relationships between different types of information.

  6. The OGRe front page: http://ogre.mcmaster.ca Sequence information for OGRe is taken from GenBank. We aim to keep up to date with publicly available animal mitochondrial genomes.

  7. Species may be selected individually from an alphabetical list Or taxa may be selected from a hierarchy. Here the Arthropods have been expanded and the Myriapods and Crustaceans have been selected

  8. Large Scale – Evolution of Gene Order in Whole Genomes On the ogre web site, a visual comparison can be made of any two selected species. Colour is used to indicate conserved blocks of genes. Alligator and Bird genomes differ by interchange of two tRNA genes (red and yellow)… …and by translocation of the two genes in the blue block.

  9. A B C D A (B C) D A D B C A B C B C D A B C D A C B D Translocations: Duplications and deletions / / Genome reshuffling mechanisms Inversions: C -C -B B A D A D

  10. Example of an inversion Example of a translocation

  11. The T and –F genes are duplicated in Cordylus warreni. If the first T and the second –P were deleted, the relative position of T and –P would change.

  12. Sometimes things go crazy …. Drosophila and Thrips are both insects yet there are 30 breakpoints for only 37 genes i.e. almost nothing in common.

  13. OGRe contains gene orders as strings. This allows searching and comparison. 231 unique gene orders have been found in 858 species. The standard vertebrate order is shared by 398 species (including humans). There are many other species with unique gene orders. Some species conserve gene order over 100s of millions of years. Others get scrambled in a few million. Still to do (new project) : - estimate relative rates of different rearrangement processes - predict most likely ancestral gene orders - use gene order evidence in phylogenetics

  14. Medium Scale – Sequence Alignments and Phylogenetics Part of sequence alignment of Mitochondrial Small Sub-Unit rRNA Full gene is length ~950 11 Primate species with mouse as outgroup

  15. 69 Mammals with complete motochondrial genomes. Used two models simulatneously Total of 3571 sites = 1637 single sites + 967 pairs Hudelot et al. 2003

  16. Afrotheria / Laurasiatheria Striking examples of convergent evolution

  17. Arthropod phylogenetics Very difficult due to strong variation in rates of evolution between species. tRNA tree – branch lengths optimized on fixed consensus topology Long branch species are problematic if tree is not fixed. Images coutesy of University of Nebraska, Dept.of Entomology. http://entomology.unl.edu/images/

  18. protein tree – branch lengths optimized on fixed consensus topology Same species are on long branches in proteins as in RNAs Images coutesy of University of Nebraska, Dept.of Entomology. http://entomology.unl.edu/images/

  19. Relative rate test for sequence evolution - Templeton Three aligned sequences with 0 known to be the outgroup. Test whether rates of evolution in branch 1 and branch 2 are equal. m1 = number of sites where 0 and 2 are the same and 1 is different. m2 = number of sites where 0 and 1 are the same and 2 is different. 0 1 2 Calculate: Should follow a chi squared distribution with one degree of freedom. Many pairs of related species found to have different rates in the mitochondrial sequences.

  20. Limulus and the fruit fly, Drosophila, differ by a single translocation of a tRNA-Leu gene (shown in yellow and marked by an arrow). Limulus and the fruit fly, Drosophila, differ by a single translocation of a tRNA-Leu gene (shown in yellow and marked by an arrow). Limulus and the fruit fly, Drosophila, differ by a single translocation of a tRNA-Leu gene (shown in yellow and marked by an arrow). Gene Order sometimes gives evidence of phylogenetic relationships The gene order of the ancestral arthropod is thought to be the same as that of the horseshoe crab Limulus. Image courtesy of Marine Biology Lab, Woods Hole. www.mbl.edu/animals/Limulus The same translocation of tRNA-Leu is found in insects and crustaceans but not myriapods and chelicerates. Strong argument for the group Pancrustacea (= insects plus crustaceans) Limulus and the fruit fly, Drosophila, differ by a single translocation of a tRNA-Leu gene (shown in yellow and marked by an arrow).

  21. Moderately rearranged Completely scrambled

  22. Very High High Species ranked according to breakpoint distance from ancestor. Medium Low

  23. R =0.99 R =0.59 R =0.53 R =0.69

  24. Highly rearranged genomes have highly divergent sequences. Rates of sequence evolution and genome rearrangement are correlated. Both are very non-clocklike. There are many species where only tRNAs have changed position. Species with highly reshuffled tRNAs have high rates of sequence evolution in both tRNAs and proteins.

  25. Relative rate of genome rearrangement (Xu et al 2006) Three gene orders with 0 known to be the outgroup. Test whether rates of rearrangement in branch 1 and branch 2 are equal. n1 = number of gene couples in 0 and 2 but not in 1 – i.e. New breakpoint in 1 n2 = number of gene couples in 0 and 1 but not in 2 – i.e. New breakpoint in 2 0 1 2 Calculate: Should follow a chi squared distribution with one degree of freedom. We took pairs where there was a significant difference in rearrangement rates (χn2 was large) and showed that there was a significant difference in substitution rates too (χm2 was large).

  26. Good Guys Bad Guys Gene order is sometimes a strong phylogenetic marker but the Bad Guys are problematic in gene order analysis as well as phylogenetics. Why does the evolutionary rate speed up in these isolated groups of species? Why to tRNA genes move more frequently? What are the relative rates of inversion and translocation? Credits: Daniel Jameson/ Bin Tang – Database design and management Daniel Urbina – Base and Amino Acid Frequencies Wei Xu – Gene Order Analysis and Arthropod Phylogenies

  27. G G C A A A A T C C G T T T T A Small Scale Evolution – Variation in Frequencies of Bases and Amino Acids The two strands of DNA are complementary. Freq of A on one strand = Freq of T on the other Freq of C on one strand = Freq of G on the other If the two strands are subject to the same mutational processes then the freq of any base should be equal (statistically) on both strands. This means that A = T and C = G on any one strand. In this case base frequencies can be described by a single variable: G+C content. BUT – mitochondrial genomes have an asymmetrical replication process. The two strands are not equivalent. The frequencies of bases on the two strands are not equal. On any one strand the frequencies of the four bases may vary independently.

  28. Mitochondrial genome replication Figure from Faith & Pollock (2003) Genetics Rank genes in order of increasing time spent single stranded COI < COII < ATP8 < ATP6 < COIII < ND3 < ND4L < ND4 < ND1 < ND5 <ND2 < Cytb ND6 is on the other strand

  29. The Genetic Code maps the 64 DNA codons to the 20 amino acids. (This version applies to Vertebrate Mitochondria) 4-codon families where the third position is synonymous

  30. Base frequencies at FFD sites in each gene (averaged over mammals) Deamination: C to U and A to G on the heavy strand

  31. Base frequencies at FFD sites are controlled by mutation. Base frequencies at 1st and 2nd positions are influenced by mutation and selection Model fitting (Data from Fish) – assume a fraction of fixed sites and a fraction of neutral sites. Selection at 1st position is weaker than at 2nd

  32. Mutation pressure is sufficient to cause change in amino acid frequencies.

  33. Slopes of the amino acid freq v base freq show the response of the amino acid to mutational pressure. Black = fish White = mammals Amino acids in the first two columns of the code have larger slopes.

  34. y2 y1 y3 Physical Properties of Amino Acids Each Amino Acid is a point in 8-d space. dij = Euclidean distance between a.a. i and j in 8-d space.

  35. Principal Component Analysis Projects the 8-d space into the two ‘most important’ dimensions. Big Small Hydrophobic Hydrophilic

  36. Second Position T C A G Third Pos. F i R s t P o s i t i o n T F F S S S S Y Y C C T C A G L L L L L L Stop Stop W W C P P P P H H R R R R T C A G Q Q A I I T T T T N N S S T C A G M M K K Stop Stop G V V V V A A A A D D G G GG T C A G E E Responsiveness measures how much an amino acid frequency varies in response to mutational pressure = Root mean square of 8 slopes for each amino acid (i.e. 4 bases x 2 data sets) Proximity measures how similar the neighbouring amino acids are in the genetic code = Mean of 1/d for accessible amino acids e.g. Prox (T) =

  37. Responsiveness and Proximity are highly correlated. R =0.87 (p < 10-6) An amino acid frequency responds to mutational pressure more easily if there are neighbouring amino acids with similar physical properties. Urbina et al. (2006) J. Mol. Evol.

  38. Homo sapiens Strand = + 3624 codons F F L L UUU UUC UUA UUG 69 139 65 11 S S S S UCU UCC UCA UCG 29 99 81 7 Y Y * * UAU UAC UAA UAG 35 89 4 3 C C W W UGU UGC UGA UGG 5 17 90 9 L L L L CUU CUC CUA CUG 65 167 276 42 P PPP CCU CCC CCA CCG 37 119 52 7 H H Q Q CAU CAC CAA CAG 18 79 82 8 R R R R CGU CGC CGA CGG 6 26 28 0 I I M M AUU AUC AUA AUG 112 196 165 32 T T T T ACU ACC ACA ACG 50 155 132 10 N N K K AAU AAC AAA AAG 29 131 84 9 S S * * AGU AGC AGA AGG 11 37 1 0 V V V V GUU GUC GUA GUG 22 45 61 8 A A A A GCU GCC GCA GCG 39 123 79 5 D D E E GAU GAC GAA GAG 12 51 63 15 G G G G GGU GGC GGA GGG 16 87 61 19

  39. Fish - 23 UU 1.250 CU 0.939 GU 0.605 UC 0.756 CC 1.205 GC 0.878 UA 1.030 CA 0.938 GA 1.145 UG 1.274 CG 0.554 GG 1.891 Mammals - 23 UU 0.939 CU 1.101 GU 0.763 UC 0.743 CC 1.163 GC 1.005 UA 1.136 CA 0.906 GA 1.027 UG 1.433 CG 0.552 GG 1.654 Fish - 31 UU 0.933 CU 1.162 AU 0.907 GU 0.911 UC 0.918 CC 1.371 AC 0.739 GC 0.839 UA 1.096 CA 0.849 AA 1.135 GA 0.758 UG 1.049 CG 0.609 AG 1.228 GG 1.499 Mammals - 31 UU 0.855 CU 1.082 AU 0.996 GU 1.115 UC 0.994 CC 1.363 AC 0.797 GC 0.873 UA 1.206 CA 0.945 AA 0.974 GA 0.776 UG 0.856 CG 0.546 AG 1.293 GG 1.369 Frequency ratios Codon bias seems to be a dinucleotide mutational effect in mitochondria, rather than an effect of translational selection. CpG effect.... (increased rate of C to U mutations in CG dinucleotides. Expect high UG and CA) DNA binding proteins....

  40. Changes in tRNA content of genomes from bacteria to mitochondria Only one type of tRNA remains for each codon family in human mitochondria. Still need 2 tRNAs for Leu and Ser. Therefore 22 in total. # denotes intracellular parasite or endosymbiont. Small size genomes in bacteria also have reduced numbers of tRNAs.

  41. Evolution of the Genetic Code:Before and After the LUCA The genetic code evolved to its canonical form before the Last Universal Common Ancestor of Archaea, Bacteria and Eukaryotes - >3 billion years ago. It appears to be highly optimized. How did it get to be this way? Numerous small changes have occurred to the canonical code since then. What is the mechanism of codon reassignment?

  42. Codon Reassignment – The Genetic code is variable in mitochondria (and also some cases of other types of genomes) UGA Stop to Trp AUA Ile to Met CUN Leu to Thr CGN Arg to unassigned AGR Arg to Ser to Stop/Gly etc..... But how can this happen? It should be disadvantageous.

  43. Example 1: AUA was reassigned from Ile to Met during the early evolution of the mitochondrial genome.

  44. Example 2: UGA was reassigned from Stop to Trp many times (12 times in mitochondria).

  45. GAIN Ambiguous codon. Selective disadvantage. LOSS New Code. Selective disadvantage because codons are used in wrong places Initial Code. No Problem. LOSS Unassigned codon. Selective disadvantage. GAIN New Code. Codons now used in right places. No Problem. Note – the strength of the selective disadvantage depends on the number of times the codon is used. There is no disadvantage if the codon disappears. The GAIN-LOSS framework (Sengupta & Higgs, Genetics 2005) LOSS = deletion or loss of function of a tRNA or RF GAIN = gain of a new tRNA or a gain of function of an existing one. Mutations in coding sequences

  46. Four possible mechanisms of codon reassignment. 1. Codon Disappearance - The codon disappears. The order of the gain and loss is irrelevant. For the other three mechanisms the codon does not disappear. 2. Ambiguous Intermediate – The gain happens before the loss. There is a period when the gain is fixed in the population and translation is ambiguous. 3. Unassigned Codon – The loss happens before the gain. There is a period when the loss is fixed in the population and the codon is unassigned. 4. Compensatory Change – The gain and loss are fixed in the population simultaneously (although they do not arise at the same time). There is no intermediate period between the old and the new codes. - cf. theory of compensatory substitutions in RNA helices. Sengupta & Higgs (2005) showed that all four mechanisms work in a population genetics simulation

  47. Summary of Codon Reassignments in Mitochondria CD mechanism explains disappearance of stop codons because they are rare initially. Only a few examples of CD for sense codons. UC and AI are important for sense codons.

  48. Three examples in yeasts (Mutation pressure GC to AU) CUN is rare (replaced by UUR) CUN Leu to Thr CGN is rare (replaced by AGR) CGN Arg codons become unassigned. AUA and AUU common and AUC is rare Nevertheless AUA is reassigned to Met. Codon does not disappear

  49. Leu and Arg codons in yeasts Codon Disappearance causes reassignments * CUN = Thr. Unusual tRNA-Thr present instead of tRNA-Leu ** CGN = unassigned. tRNA-Arg is deleted

  50. AUA Ile to Met in Yeasts codon anticodon AUU Ile GUA AUC Ile “ AUA Ile K2CAU AUG Met CAU

More Related