650 likes | 854 Views
Plastid genomes. A small structure occurring in the cytoplasm of plant cells. The most important are the chloroplasts. Other plastids contain red, orange, and yellow pigments, giving color to petals and fruits, and some contain starch, oil, etc., acting as storage organelles.
E N D
Plastid genomes • A small structure occurring in the cytoplasm of plant cells. The most important are the chloroplasts. Other plastids contain red, orange, and yellow pigments, giving color to petals and fruits, and some contain starch, oil, etc., acting as storage organelles. • 30 finished for 29 organisms • http://megasun.bch.umontreal.ca/ogmp/projects/other/cp_list.html
Chloroplast DNA (cpDNA) • circular double-helix; 20-80 copies per chl. • sequences for • gene expression (tRNA, rRNA, etc.) • for photosynthesis (prot.) • no recombination • uniparental inheritance • conservative evolution • nuclear genetic code
Genomes • The whole genomes of over 800 organisms can be found in Entrez Genomes. The genomes represent both completely sequenced organisms and those for which sequencing is in progress. All three main domains of life - bacteria, archaea, and eukaryota - are represented, as well as many viruses.
Genome miniaturization • Use and disuse philosophy • Mt Genome size following endosymbiosis • Reclinomonas(62 protein encoding genes) • Plastid Genome size in parasites • Epiphagus (Beech drop)
Phylogenetic distribution of gene loss from chloroplast genomes.Colour keys designating frequency of parallel gene losses are given at top right. Numbers below species names indicate the number of protein coding genes and ycfs in the corresponding chloroplast genome. Numbers above gene columns represent the number of genes lost which are accounted for in the figure for the given genome. The symbols for primary and secondary symbiosis are indicated. Five genes were excluded from gene-loss analysis for reasons indicated at the lower left. Some highly divergent proteins may have escaped detection with BLAST searches. Functional, transferred nuclear homologues of chloroplast origin are indicted in white rectangles. In Pinus, four ndh genes are completely missing (ndhA, ndhF, ndhG, ndhJ), the other seven are pseudogenes23 and are scored as losses here
Genes are just one of many types of DNA sequences • single copy genes • multiple copy genes • noncoding repetitive sequences (often, most of genome!)
increase in Genome size • Regional (particular sequence is multiplied) • Gene duplication, unequal crossing over • Global (entire genome or chromosome is duplicated) • Polyploidization • Trasposons
Polyploidy • Allopolyploidy: the combination off genetically distinct chromosome sets • Autopolyploidy: multiplication of one basic set of chromosomes
Tetraploidy • Genome doubling • Most common • Is found in most organisms
Survive only rarely • Prolongation of cell division time • Increase the volume of the nucleous • Increase # of chromosome disjuctions • Genetic imbalance • Interference with sexual differentiation
Arabidopsis • 115.4 megabase out of 125 MB • Whole genome duplication, gene loss and lateral transfer from plastid
Appearance of genomes What does 50 kb of sequence look like? • One to many chromosomes • Repeat sequences common in some genomes e.g. 35% of human are transposable elements • Gene structure varies – no. and length of introns repeat Pseudogene Intron-exon components of a gene Human – very few genes - repeats Yeast – many genes (~25) – few repeats Maize – mostly repeats
Gene Duplication • Partial or internal gene duplication • Complete gene duplication • Partial chromosomal duplication • Polyploidy or genome duplication
Gene Duplication • Duplicative transposition • Unequal crossing-over • Replication slippage • Gene amplification (rolling circle replication)
Antifreeze glycoprotein gene • Fish living in Antarctic Ocean have body temps -1.0 to -0.7 C. • Freezing resistance is due to a protein in the blood that adsorbs small ice crystals and inhibits their growth
Internal Gene Duplication 5’ 1 2 3 4 5 6 3’ Ancestral trypsinogen gene Deletion 1 6’ 5’ 3’ Thr Ala Ala Gly 4 fold duplication + addition of spacer sequence 1 6’ 5’ 3’ Internal duplications + addition of intron sequence Spacer: Gly … 1 1 2 3 4 5 6 7 37 38 39 40 41 6’ 3’ 5’ Antifreeze glycoprotein gene
Species C Species A Species B Gene trees vs species trees
Lineage 3 Lineage 2 Lineage 1 2 3 1 Species C B C Species A Species B A gene loss gene loss Lineage goes extinct (gene loss) Duplication event Gene trees vs species trees
2 3 1 B C A Gene trees vs species trees A C B 3 1 2
Purines Pyrimidines A G C T
Rates of Nucleotide Substitution • Basic quantity in studying molecular evolution • Among genes • Within genes • Among organisms • Among codon positions or 2nd structure
Different Gene Regions • Coding regions • Nondegenerate sites • Twofold degenerate sites • Fourfold degenerate sites • Noncoding regions • 5’ & 3’ untranslated regions • Introns • Psuedogenes
Table 4.1 Rates of synonymous and nonsynonymous nucleotide sustitutions (± standard errors) in various mammalian protein-coding genesa
Table 4.2 Rates of transitional and transversional substitutions (per site per 109 years) at nondegenerate, twofold degenerate, and fourfold degenerate codon sitesa aThe rates are averages over the genes in Table 4.1.
Causes of Rate Variation • Functional constraints
Causes of Rate Variation • Synonymous vs. Nonsynonymous rates • Should be similar in rate (Ka/Ks=1) • Why not? • Selection • Advantageous • Purifying
Causes of Rate Variation Variation within a gene
Causes of rate Variation • Variation among genes • Rate of mutation • The intensity of selection (1000 fold in Ks) • Intensity of purifying selection (functional cont) • Partial loss of function • Relaxation of selection
Nucleotide Substitution rates in Eukaryotic Genomes Ka rate Ks rate Relative Ks rate Genome Angiosperm mt 0.5 1 0.1 Angiosperm cp single copy 1.5 3 0.2 inverted Repeat 0.3 0.6 0.1 Angiosperm nuc. 5.4 12 0.4 Mammalian nuc. 2-8 4-16 0.5-1.3 Mammalian mt 20-50 40-100 2-3 Estimated rate of substitutions/site/10 9 years. From Palmer, 1991
Phylogenetic trees are about visualizing evolutionary relationships Nothing in Biology Makes Sense Except in the Light of Evolution Theodosius Dobzhansky (1900-1975)
Trees • Diagram consisting of branches and nodes A B C D E terminal node (leaf) interior node (vertex) split (bipartition) also written AB|CDE or portrayed **--- branch (edge) root of tree
Trees • Species tree (how are my species related?) • contains only one representative from each species • when did speciation take place? • all nodes indicate speciation events • Gene tree (how are my genes related?) • normally contains a number of genes from a single species • nodes relate either to speciation or gene duplication events
Terms • Clade: A set of species which includes all of the species derived from a single common ancestor • Monophyly: • Polyphyly • Paraphyly
Monophyletic Paraphyletic A A A B BC BRANCH NODE
Polyphyletic (Reptiles) A A A B BC BRANCH NODE
Phylogeny Estimation Camin-Sokal Parsimony Wagner Parsimony Fitch Parsimony Transversion Parsimony Generalized Parsimony Transition/transversion bias Nucleotide composition Among-site rate variation Synonymous/nonsynonymous Relaxed clock models
Distance methods Calculate the distance CORRECTING FOR MULTIPLE HITS The Distance Matrix 7 Rat 0.0000 0.0646 0.1434 0.1456 0.3213 0.3213 0.7018 Mouse 0.0646 0.0000 0.1716 0.1743 0.3253 0.3743 0.7673 Rabbit 0.1434 0.1716 0.0000 0.0649 0.3582 0.3385 0.7522 Human 0.1456 0.1743 0.0649 0.0000 0.3299 0.2915 0.7116 Oppossum 0.3213 0.3253 0.3582 0.3299 0.0000 0.3279 0.6653 Chicken 0.3213 0.3743 0.3385 0.2915 0.3279 0.0000 0.5721 Frog 0.7018 0.7673 0.7522 0.7116 0.6653 0.5721 0.0000