390 likes | 690 Views
Genome structure and evolution. Jan Pačes Institute of Molecular Genetics AS CR. sizes of selected completed genomes. genome complexity. genome sizes. arabidopsis thaliana. psilotum nudum. genome size ~100 Mbp. genome size: ~ 250 Gbp. unregular genome sizes?.
E N D
Genome structure and evolution Jan Pačes Institute of Molecular Genetics AS CR
genome sizes arabidopsis thaliana psilotumnudum • genome size ~100 Mbp • genome size: ~ 250 Gbp
unregular genome sizes? • Schizosaccharomycespombe • fission yeast, genome smaller than many bacterias • genome 12 462 637 bp, 4 929 genes • Mimivirus • virus of an amoeba • genome 1 181 404 bp, 1 262 genes • Tetraodonnigroviridis (pufferfish) • same number of genes as human, genome size only 1/10th • 300 Mbp, 27 918 genes
C-value • C-value refers to the amount of DNA contained within a haploid nucleus • in picograms • among diploid organisms the terms C-value and genome size are used interchangeably • in polyploids the C-value may represent two or more genomes contained within the same nucleus • in animals C-value range more than 3,300x • genome size (bp) = (0.978 x 109) x DNA content (pg) • DNA content (pg) = genome size (bp) / (0.978 x 109) • 1 pg = 978 Mb
genome sizes • 0.0023 pg in the parasitic microsporidium Encephalitozoon intestinalis • 1 400 pg in protist, the free-living amoeba Chaos chaos Gregory T http://www.genomesize.com
C-value enigma • What types of non-coding DNA are found in different eukaryotic genomes, and in what proportions? • From where does this non-coding DNA come, and how is it spread and/or lost from genomes over time? • What effects, or perhaps even functions, does this non-coding DNA have for chromosomes, nuclei, cells, and organisms? • Why do some species exhibit remarkably streamlined chromosomes, while others possess massive amounts of non-coding DNA? • What is the minimal genome?
e-cell • model and reconstruct biological phenomena in silico http://www.e-cell.org
Synthetic genomes • Mycoplasmalaboratorium • Gibson D, et al. (2008): Complete Chemical Synthesis, Assembly, and Cloning of a MycoplasmagenitaliumGenome. Science. DOI: 10.1126/science.1151721 • Synthia • synthetic species of bacterium derived from the genome of Mycoplasmamycoidesfrom scratch and transplanted into a Mycoplasmacapricolum cell • Gibson D, et al. (2010): Creation of a bacterial cell controlled by a chemically synthesized genome. Science. DOI: 10.1126/science.1190719
S E P A C just for fun – watermarks VENTERINSTITVTE CRAIGVENTER HAMSMITH CINDIANDCLYDE GLASSANDCLYDE "TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE." "SEE THINGS NOT AS THEY ARE, BUT AS THEY MIGHT BE." "WHAT I CANNOT BUILD, I CANNOT UNDERSTAND."
homo sapiens, gene distribution Saccone S, et al. (2001) Chromosome Res.
structure of human genome • Up to date was read 3,164.7 billions nucleotides. • Average gene is 3 thousands nucleotides length, longest gene (dystrophin) is 2.4 billion nucleotides length. • Number of the genes is between 20k and 30k (23k) • Less than 2% of the genome code some protein. • Function of more than 50% of the genes is unknown. • DNA is more than 99,9% identical between all humans. • Repetitive elements, which does not code proteins ("junk DNA") compose more than 50% of the human genome. • Entropy rate is around 1.7 (.9 for Y chromosome). • Around 20% of our genome is transcribed.
importance of “junk” DNA • syncytin (adapted ancestral envpolyprotein) • Blond JL (1999): Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family". J Virol • social behavior in rodents (and possibly humans) • Hammock EA, Young LJ (2005): Microsatellite instability generates diversity in brain and sociobehavioral traits. Science • regulation of gene expression and promotion of genetic diversity • Peaston A, et al (2004): Retrotransposons Regulate Host Genes in Mouse Oocytes and Preimplantation Embryos. Developmental Cell • evolution of sequences, for example, an antifreeze-protein gene in a species of fish • DeVries AL and Cheng C-HC (2005): Antifreeze proteins in polar fishes. Fish Physiology • source of microRNAs • Woolfe A, et al (2005): Highly conserved non-coding sequences are associated with vertebrate development .PLoSBiol • LINE-1 capable of repairing broken strands of DNA. • Morrish TA, et al (2002): DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nature Genetics
synthesizing non-natural parts from natural genomic template • Journal of Biological Engineering 2009, 3:2 • doi:10.1186/1754-1611-3-2 • Pawan K Dhar1 , Chaw Su Thwin1 , Kyaw Tun1 , Yuko Tsumoto1 , Sebastian Maurer-Stroh2 , Frank Eisenhaber2 and Uttam Surana3 • The current knowledge of genes and proteins comes from 'naturally designed' coding and non-coding regions. It would be interesting to move beyond natural boundaries and make user-defined parts. To explore this possibility we made six non-natural proteins in E. coli. We also studied their potential tertiary structure and phenotypic outcomes. • The chosen intergenic sequences were amplified and expressed using pBAD 202/D-TOPO vector. All six proteins showed significantly low similarity to the known proteins in the NCBI protein database. The protein expression was confirmed through Western blot. The endogenous expression of one of the proteins resulted in the cell growth inhibition. The growth inhibition was completely rescued by culturing cells in the inducer-free medium. Computational structure prediction suggests globular tertiary structure for two of the six non-natural proteins synthesized.
main events in genome evolution • mutations (SNP) • duplications • rearrangements • horizontal transfer • parasitic DNA
how and where to find transposones • Repbase • database of repetitive elements • http://www.girinst.org/repbase • RepeatMasker • search for repetitions in genome sequence • http://www.repeatmasker.org
repetitive elements in human genome • Transposones: transposon-derived repeats, interspersed repeats • 45% of the genome • Micro a minisatellites: simple sequence repeats repetition of simple sort direct repeats • 3% of the genome • Duplications: duplications of genome segments of different length (10 - 300 kb); inter and intra - chromosomal • 3.3% of the genome • Other types of repetitions: centromeric and telomeric repeats IHGSC, Nature 2001
transposones in human (vertebrate) genome • DNA transposones • retrotransposones RNA as intermediate, reverse transcription • LTR transposones (similar to retroviruses) • polyA retrotransposones (colinear with mRNA, polyA) human chromosome 21
DNA transposones • 2-3 kb • terminal reversed repetitions (50 - 100 bp) • cut-and-paste mechanism • 3% of the genome • at least 7 classes, some of them not related
LTR retrotransposones • LTR – long terminal repeat • Human Endogenous Retroviruses (HERVs) • RNA intermediate (RNA pol. II ) • short insertional duplications (4-6 bp) • 8 % of the genome • 100 000 elements, tens of families
LINE1 (L1) elements • LINE – long interspersed elements • poly A (non-LTR) retrotransposons • RNA intermediate (internal promotor for RNA pol. II) • insertion duplication of different length (5-15 bp) • insertion preferences (TT AAAA) • 17 % of genome • 500 000 elements, often cutted at 5' end • 30-60 active LINE1 elements in genome
nonautonomous elements • They do not code enzymes for their own transposition. • For each class of the autonomous elements exists nonautonomous elements. Such elements use different mechanism of replication, specific for autonomous elements.
SINE (Alu) elements • SINE – short interspersed elements • poly A (non-LTR) retrotransposons • RNA intermediate (internal promotor for RNA pol. III) • insertion duplications (5-15 bp) • insertion preferences (TT | AAAA) • 10 % of genome • 1 000 000 elements, often cutted at 5' end
processed pseudogenes • colinear with mRNA • missing introns and promotores; poly A • often 5' cutted • bordered by direct repeats of different legth (4-15bp) • insertion sites are similar to LINE1 transposition • generated by L1
coevolution of “DNA parasites” DNA transposones LTR retrotransposones polyA retrotransposones
HERV16 - example http://hervd.img.cas.cz
1000 Genome Project current status • Trio project: two families with ~42x coverage • Yoruba and Caucasian • Low-coverage project: ~5x coverage of unrelated individuals • 60 Yoruba, 60 Caucasians, 30 Han, 30 Japanese • Exon project: 8000 exons (900 genes) by capture array, >50x coverage, 700 unrelated individuals • + 2 individual sequences (Watson and Venter) 1000GPC, Nature 2010
stability / fluidity of the genome • ~200 to 300 loss-of-function variants in annotated genes and 50 – 100 variants of implicated inherited disorders • 10-8 per base per generation germline substitution rate 1000GPC, Nature 2010
ENCODE Encyclopedia Of DNA Elements Raney, NAR 2010
genome browsers • Golden Path • http://genome.ucsc.edu • ENSEMBL • http://www.ensembl.org
that’s it, thank you Institute of Molecular Genetics AS CR Free and Open Bioinformatics Association