220 likes | 384 Views
What to do with so many „omes“?. Proteome. Interactome. Genome. Kinome. ORFome. Metabolome. Transkriptome. Alejandro O. Mujica Mainz, Nov 2004 mujica@uni-mainz.de. Genomic‘s timeline. 10 7. 1,34 x10 6 ?. 10 6. 6,69 x10 5. 2,27 x10 5 ?. 1,13 x10 5. 10 5. 10 4. 10 3. 10 2.
E N D
What to do with so many „omes“? Proteome Interactome Genome Kinome ORFome Metabolome Transkriptome Alejandro O. Mujica Mainz, Nov 2004 mujica@uni-mainz.de
Genomic‘s timeline 107 1,34 x106 ? 106 6,69 x105 2,27 x105 ? 1,13 x105 105 104 103 102 60? 7 7 53 6 44 31 29 101 3 18 2 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 (Mid) Publication year Number fo genomes Total genome size (Kb) Number of ORFs 225 published eukaryotic genomes Oct. 2004 452 eukaryotic 520 prokaryotic ongoing http://www.genomesonline.org/
10$ Base 0.01$ Base DNA sequencing:A fast pace developement! 28 G Basen 22 M Seq. Genbank 1960 1970 1980 1990 2000
Hierarchical Sequencing or „Chromosome walking“
m1.1 m1.2 m1.3 m1.4 m1.5 m2.1 m2.2 m2.3 m2.4 m2.5 m3.1 m3.2 m3.3 m3.4 m3.5 mRNA1 mRNA 2 mRNA 3 Human AC016718 m3.1 Predicted mouse mRNA x human genomic m3.2 m3.3: insecure!! m3.3 m3.4 m3.5 h3.4 h3.2 h3.5 human orthologous h3.1 Murine BAC221D7 h3.X h3.1 Extended human mRNA x mouse genomic h3.2 h3.4 h3.Y h3.5 h3.Z m2.4 m3.1 m3.2 m3.4 m3.Y m3.5 m3.Z murine extended m2.5 (with stop) false positive!Predicted genes 2 and 3 merged Improoving gene prediction by comparative analysis RUMMAGE prediction DotPlot h3.1 h3.2 h3.4 h3.5 Human dbEST BLAST searches h3.1 h3.4 h3.5 h3.2 EST clustering h3.X h3.Y h3.Z m3.3 false negative Swap Iterate ...after 3 iterations and subsequent EST clustering, 17 exons were found for each STK33/Stk33
Genomic landscape (PIPmaker & VISTA) 49553 ATATTTTAAAACTAATAGTTGATTTAATCGTGATTCTGTAACATTTATGACATCCTGAATCAAAGAACCTACTATTTCTTGATCTTTTGAAAAGTACTTT ------------|||||:||||||||| |:||||:| :||||||| ||::|||::|||||||||||-|:|||||||||| ||||||||||||:||-|||| 52386 TAATAATTGATTTAAGCATGATCCAATAACATTAATAGCATTTTGAATCAAAGA CTTACTATTTCTAGATCTTTTGAAAGGT CTTT 49653 GTACCTATGAGCAAATTCTTACAAAGAAACCTTCTAACGCTTATTTTTC CTCCTTTAAAAATGATCCTATTCATCTTTTGTTTTAAAGGCTCTGTAGC ||||||| ||:||||||||:|||||||| ||||::||:| |:||--|||:||||||||:::||||||||||||||||||||||||||||||||||||:| 52472 GTACCTAAGAACAAATTCTCACAAAGAATCCTTTCAATGAATGTTATTTTTCTCCTTTAGGGATGATCCTATTCATCTTTTGTTTTAAAGGCTCTGTAAC 49751 TTATTGGTACTGCATAAAAATTCCATTATTATAGAGACTGTTCTCTTTGCTTCTCTGCTGCATCTTCTTCTGAAGTTATGAGAGATGTGAAACATAGTCT |||||||:||||||||||||||||||||||:||||| |||||||||||| :||||||||||||:|:|:||||||||||||||||||||||||||| ||| 52572 TTATTGGCACTGCATAAAAATTCCATTATTGTAGAGTATGTTCTCTTTGCGCCTCTGCTGCATCCTTTCCTGAAGTTATGAGAGATGTGAAACATACTCT 49851 GAAAATTCATTCAAAGCTTTAAAACTTATTATATCCAGGAGAGACAGATTTAAGATTGCTGAGGTTTCTGGTTGAAGTACTTCAAGCAGAAATCCATACT |||||||||||| |||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||:||||||||||||||||||| 52672 GAAAATTCATTCCCAGCTTTAAAACTTATTATATCCAGGAGAGACAGATTGAAGATTGCTGAGGTTTCTGGTTGAAGTACCTCAAGCAGAAATCCATACT 49951 CCATTAGACTCTCTAACACTGTGACCTTAATTTTTCAGATATTTTGCTCTTTAAATTGAAAGTAGTATTTTAATACAATTTAAGTCTTTTGAAGGGGCTT |||||||:|||||||||||::|||||||||||||:||||||||||||:||||||||||||||||||:||||||||| ||||||||:| |||||||||||| 52772 CCATTAGGCTCTCTAACACCATGACCTTAATTTTCCAGATATTTTGCCCTTTAAATTGAAAGTAGTGTTTTAATACTATTTAAGTTTGTTGAAGGGGCTT 50051 GTAGTACCACAGACAGAAGTTGTGAAACATTTG AACAAATGGAAAGTAA Hsa |||:||:||||||| ||||||||||||||||-||||||||:||||||| 52872 GTAATATCACAGACTTTAGTTGTGAAACATTTGAAACAAATGAAAAGTAA Mmu ~70% (A+T) content
Human-mouse dotplot BBAC221D7, AJ307671 ( 158 kb) Coding for the kinase domain Type of exons : Non - coding Coding
(G+C) content and „CpG-Inseln“ 42,37% (G+C) KIAA0298 C11orf14 C11orf18 C11orf15 C11orf16 C11orf17 CEPG1 ASCL3 STK33 L27a ST5 %GC 80 Amid Mujica Bahr 60 40 0 100 200 300 400 500 600 700 kb 44,94% 39,05% 43,86% (G+C) D7H11orf14 D7H11orf15 D7H11orf16 D7H11orf17 Kiaa0298 Cepg1 Ascl3 Stk33 L27a St5 %GC 80 Bahr Amid Mujica 60 40 0 100 200 300 400 500 600 kb BAC221D7 45,13% 40,05%
Collins, Morgan & Patrinos The Human Genome Project: Lessons from large-scale biology Sciene 2003 300: 286-290
The N50 SituationDid they, did they not? N50 length: contiguity of the sequencing assesment, defined as the largest length L such that 50% of all nucleotides are contained in contigs of size at least L Human Mouse IHGSC Celera Celera MGSC Coverage 7.5 5.1 5.3 7.7 N50 length 82kb 86kb 14kb 29kb 02.2001 02.2001 04.2002 12.2002 $ Celera relied on public data for their assembly. No true WGS assembly Waterston, Lander & Sulston PNAS 2002 Cozzarelli’s PNAS invitation: Celera’s assembly is arefeinement built on the HGP assemblies Waterston, Lander & Sulston PNAS 2003 Celera’s assembly had ~35,000 fewer ordering errors Adams, Sutton, Smith, Myers & Venter PNAS 2003
But predictions are still predictions, they must be tested in the wet lab, and surprises are guaranteed! The problem with being on the cutting edge, is that you occasionally get sliced from time to time.... Duncan Clark
The horizontal transfer situation Gradual rejection of HT hypothesis -110: not characteristic bacterial Perilous BLASTs against a sparse dataset Genereux and John M. Logsdon Trend. Genet. 2003 Hundreds [223] of human genes appear likely to have resulted from horizontal transfer from bacteria at some point in the vertebrate lineage. Dozens of genes appear to have been derived from transposable elements. Lander et al., Nature 2001
A draft version is error prone Human Mouse Mainz
„Growing“ version Current response from Celera’s human genome assembly: (publication.celera.com) Could not connect to JRun Server • Intuitive navigation • Ongoing human genome map • Links to VERY usefull databases • OMIM • UniGene • PubMed • Nucleotide • Protein • Homology...
New human genes each time a new genome is sequenced? Fugu rubripes genome. ~1000 new human genes Aparicio, Chapmanm Stukpa et al. Science 2002 Hsa Mmu
Whole genomes comparison Synteny http://www.genboree.org/
What‘s next? Vertical Genomics Systems Biology
The first genomic characterization of a microbial community Tyson et al., Community structure and metabolism through reconstruction of microbial genomes from the environment Nature 04 March 2004 428: 37-49