1 / 85

Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia

Genomes Databases and Open Access Bibliographic Resources. Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia Institut Pasteur Paris. Outline. General introduction and overview of complete genome sequences Genomes databases and where to find them

stacey
Download Presentation

Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomes Databases and Open Access Bibliographic Resources Sonia Abdelhak Institut Pasteur Tunis Ahmed Rebaï Centre of Biotechnology Sfax Fredj Tekaia Institut Pasteur Paris

  2. Outline • General introduction and overview of complete genome sequences • Genomes databases and where to find them • Comparative Genomics Databases • Other Omics resources • Bibliographic/Open access resources

  3. Why databases? • In the genomic era we have billions of data that need to be stored, curated and made accessible for analysis and knowledge discovery • Databases are essential resources for both experimental and computational biologists • We have crossed the Terabyte threshold of genomic data (Huge, massive, explosion!)

  4. Chronology of completely sequenced genomes • 1977: first viral genome (5386 base pairs; encoding 11 genes). Sanger et al. sequence bacteriophagefX174. • 1981: Human mitochondrial genome. 16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA) • 1986: Chloroplast genome. 156,000 base pairs (most are 120 kb to 200 kb)

  5. 1995: first genome of a free-living organism, the bacterium Haemophilus influenzae, by TIGR, 1830 Kb, 1713 genes. 1996: first genome of an archaeal genome: Methanococcus jannaschii DSM 2661, by TIGR, 1664 Kb, 1773 genes. 1997: first eukaryotic genome : Saccharomyces cerevisiae S288C; International collaboration; 16 Chromosomes; 12,057 Kb, ~6000 genes. 1998: first multicellular organism Nematode Caenorhabditis elegans; 97 Mb; ~19,000 genes.

  6. 1999:first human chromosome: Chromosome 22 (49 Mb, 673 genes))

  7. • 2000:Fruitfly Drosophila melanogaster (137 Mb; ~13,000 genes) • 2000first plant genome: Arabidopsis thaliana (115,428 Mb; 22670 genes • 2001:draft sequence of the human genome (3300 Mb; ~28000 genes) • 2002: Plasmodium falciparum (22,9 Mb; 5334 genes) • 2002: mouse genome (2700 Mb; ~28000 genes) • 2004: Fish draft Tetraodon nigroviridis genome (x Mb; ~28000 genes); • 2005: Dog (41Mb, 33651 genes) and chicken genomes ( 18031 genes)

  8. Complete genomes • 2467 projects • 524 published (03-17-07) • 1091 Bacteria • 59 Archaea • 720 eukaryotes Tree of life • 3 phylogenetic domains; • Lifestyles: mesophiles; (hyper)thermophiles; psychrophiles;extreme conditions,... http://www.genomesonline.org/

  9. Genome sequencing projects There are several web-based resources that document the progress of completely sequenced genomes and their reference publication, including: GOLD Genomes Online Database http://www.genomesonline.org/gold.cgi

  10. How big are genome sizes? Viral genomes: 1 kb to 360 kb (Canarypox virus) Note: Mimivirus: 1.2 Mb http://www.giantvirus.org/top.html (Top 100 largest viral genome sequences) Bacterial genomes: 0.5 Mb to 13 Mb; Eukaryotic genomes: 8 Mb to 670 Gb; Database of Genome sizes: http://www.cbs.dtu.dk/databases/DOGS/index.php

  11. Genome Sizes (MegaBases)

  12. BIOLOGICAL DATABASE CATEGORIES •Databases of nucleic acid sequences (RNA, DNA) •Databases of protein sequences •Databases of protein motifs and protein domains •Databases of structures •Databases of genomes •Databases of genes •Databases of expression profiles •Databases of SNPs and mutations •Databases of metabolic pathways and protein associations •Databases of taxonomy •…

  13. Can we find a list of ‘clean’ databases ?

  14. The NAR Database issue • The 2007 update includes 968 databases, 110 more than the previous one. • 68 new databases • updates of 106 existing databases • The complete database list and summaries are available online on the Nucleic Acids Research web site http://nar.oxfordjournals.org/

  15. NAR Database Category List • Nucleotide Sequence Databases • RNA sequence databases • Protein sequence databases • Structure Databases • Genomics Databases (non-vertebrate) • Metabolic and Signaling Pathways • Human and other Vertebrate Genomes • Human Genes and Diseases • Microarray Data and other Gene Expression Databases • Proteomics Resources • Other Molecular Biology Databases • Organelle databases • Plant databases • Immunological databases

  16. Genomics Databases (non-vertebrate) • MGD - Mouse Genome Database ????? • TIGR Gene Indices ????? • Genome annotation terms, ontologies and nomenclature • Taxonomy and identification • General genomics databases • Viral genome databases • Prokaryotic genome databases • Unicellular eukaryotes genome databases • Fungal genome databases • Invertebrate genome databases

  17. Three type of Genome database • Databases which collect data of all sequenced genomes (Entrez_Genomes; EBI_genomes) • Databases which collect data of a category of organisms with sequenced genomes (Microbial Genomes at TIGR) • Databases specific for one organism with sequenced genomes (Flybase, MGD, Ensembl)

  18. What kind of information you find there? • Genome databases contain genomic information collected from many sources. – Genome assembly – Gene predictions – Known genes, mRNA, ESTs, proteins – Genetic maps, markers and polymorphisms – Gene expression and phenotypes – Annotations – Interspecies homologues

  19. Resources for genomes There are two main resources for genomes: EBI European Bioinformatics Institute http://www.ebi.ac.uk/genomes/ NCBI National Center for Biotechnology Information http://www.ncbi.nlm.nih.gov/Genomes/ But many others resources from sequencing Institutions: Sanger The welcome Trust Sanger Institute http://www.sanger.ac.uk/ TIGR The Institute for Genomic Research http://cmr.tigr.org/tigr-scripts/CMR/shared/Genomes.cgi Genolevures http://cbi.labri.fr/Genolevures/index.php

  20. Databases by phylogenetic groups Eucaryotic genomes: http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi Bacteria, fungi genomes: http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi?p3=11:Fungi&taxgroup=11:Fungi|12: Insects:http://www.ncbi.nlm.nih.gov/genomes/leuks.cgi?p3=12:Insects&taxgroup=11:|12:Insects Plant genomes:http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html ...

  21. OMIM PubMed PubMed Central 3D Domains Journals Structure Books CDD/CDART Entrez Taxonomy Protein Genome GEO/GDS UniSTS UniGene Nucleotide SNP PopSet The(ever expanding)Entrez System

  22. Other GenBank WGS UniGene Transcript RefSeq Contig BAC RefSeq Transcript Mouse Assembly

  23. Maps and Options

  24. Common features of genomic database • Possibility to download all the sequences of the genome or part of them (chromosomes, clones, genes, CDS,..) • Most of them have a corresponding protein resource (the set of proteins obtained by translating all CDS) • Example: Entrez-Genome of the NCBI Genpept

  25. Comparative Genomics databases

  26. Comparative genomics Analyses of the genetic material of different species help understanding the similarity and differences between genomes, their evolution and the evolution of their genes. •Intra-genomic comparisons help understanding the degree of duplication (genome regions; genes) and genes organization,... •Inter-genomic comparisons help understanding the degree of similarity between genomes; degree of conservation between genes; •understanding gene and genome evolution

  27. Internet resources for whole-genome comparative analysis and associated tools ResourceURL UCSC Genome4 Bioinformatics http://genome.ucsc.edu/ Ensembl http://www.ensembl.org/ MapViewer http://www.ncbi.nlm.nih.gov/mapview/ VISTA Genome Browser http://pipeline.lbl.gov/ K-BROWSER http://hanuman.math.berkeley.edu/cgi-bin/kbrowser2 Comparative Regulatory Genomics http://corg.molgen.mpg.de/ GALA http://www.bx.psu.edu/ EnsMart http://www.ensembl.org/EnsMart/ ETOPE http://www.bx.psu.edu/ PipMaker and MultiPipMaker http://www.bx.psu.edu/ VISTA server http://www-gsd.lbl.gov/vista/ MAVID server http://baboon.math.berkeley.edu/mavid/ zPicture server http://zpicture.dcode.org/ rVISTA server http://rvista.dcode.org/ COGs: Clusters of Orthologous Groups: http://www.ncbi.nlm.nih.gov/COG/

More Related