1 / 50

Bioinformatics Information Resources And Networks

Bioinformatics Information Resources And Networks. ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de. Outline. Bioinformatics Information Resources And Networks EMBnet – European Molecular Biology Network DBs and Tools

nevan
Download Presentation

Bioinformatics Information Resources And Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BioinformaticsInformation Resources And Networks ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de Ulf Schmitz, Bioinformatics Information Resources and Networks

  2. Outline • Bioinformatics Information Resources And Networks • EMBnet – European Molecular Biology Network • DBs and Tools • NCBI – National Center For Biotechnology Information • DBs and Tools • Nucleic Acid Sequence Databases • Protein Information Resources • Metabolic Databases • Mapping Databases • Databases concerning Mutations • LiteratureDatabases Ulf Schmitz, Bioinformatics Information Resources and Networks

  3. EMBnet – European Molecular Biology Network • Founded in 1988 • Network that links European laboratories that use biocomputing and bioinformatics in molecular biology research • is a science-based group of collaborating nodes throughout Europe and nodes outside Europe • provides information, services and training to the useres • efforts to increase the availability and accessibility of data resources and computing tools • increase knowledge and proficiency in bioinformatics through education and training Ulf Schmitz, Bioinformatics Information Resources and Networks

  4. EMBnet - Nodes • governmental • Biocomputing centers from • non European countries • academic, industrial • research centers Ulf Schmitz, Bioinformatics Information Resources and Networks

  5. EMBnet - Nodes • Appointed by the governments • Provide on-line services, user support and training Ulf Schmitz, Bioinformatics Information Resources and Networks

  6. EMBnet - Nodes • Academic, industrial or research centers in specific areas of bioinformatics • Largely responsible for maintainance of biological databases and software Munich Information Center for protein sequences Hinxton Hall (Cambridge UK) Important key specialist node and home of: EMBL, SWISS-PROT and TrEMBL databases Ulf Schmitz, Bioinformatics Information Resources and Networks

  7. EMBnet - Nodes • Centers from non European countries Ulf Schmitz, Bioinformatics Information Resources and Networks

  8. EMBnet’s Mission • Assist in biotechnological and bioinformatics related research • Provide training and education • Exploit network infrastructures • Investigate and develop new technologies • Bridge between commercial and academic sectors Ulf Schmitz, Bioinformatics Information Resources and Networks

  9. Who are EMBnet’s Users? • > 40,000 registered users from all over the world as well as a larger number of Internet users • All scientists working in Life Sciences, from undergraduate students to top level scientists, in academia as well as industry, can get support from EMBnet Ulf Schmitz, Bioinformatics Information Resources and Networks

  10. EMBnets – SRS Sequence Retrieval System - SRS • result of a research project with the EMBnet to interrogating all resources gathered together • SRS is a network browser for DBs in molecular Biology • SRS allows any flat-file DB to be indexed to any other • queries across a range of different DB types via a single interface • independent of underlying data structures or query languages Ulf Schmitz, Bioinformatics Information Resources and Networks

  11. Ulf Schmitz, Bioinformatics Information Resources and Networks

  12. EMBnets - EMBOSS • The European Molecular Biology Open Software Suite • EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. • The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. • Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. • EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. Ulf Schmitz, Bioinformatics Information Resources and Networks

  13. What can EMBOSS do for you? • Within EMBOSS you will find around hundreds of programs (applications) covering areas such as: • Sequence alignment, • Rapid database searching with sequence patterns, • Protein motif identification, including domain analysis, • Nucleotide sequence pattern analysis---for example to identify CpG islands or repeats, • Codon usage analysis for small genomes, • Rapid identification of sequence patterns in large scale sequence sets, • Presentation tools for publication, • and much more. Check:http://emboss.sourceforge.net/ Ulf Schmitz, Bioinformatics Information Resources and Networks

  14. Jemboss Ulf Schmitz, Bioinformatics Information Resources and Networks

  15. Leading American information provider Established in 1988 as a division of the National Library of Medicine (NLM) Located on the campus of the National Institute of Health (NIH – Rockville/Maryland) Mission: Development of new information technologies to aid our understanding of the molecular and genetic processes that underlie health and disease Creation of systems for storing and analysing biological information Development of advanced methods of computer-based information processing Facilitation of user access to DBs and software Co-ordination of efforts to gather biotechnology information worldwide NCBI – National Center For Biotechnology Information Ulf Schmitz, Bioinformatics Information Resources and Networks

  16. NCBI • Since 1992 – maintainance of GenBank and collaboration with international nucleotide DBs: EMBL and DDBJ (Japan) • Providing the Entrez that facilates to access biological DBs (similar to SRS that is provided by the EMBnet) Ulf Schmitz, Bioinformatics Information Resources and Networks

  17. Ulf Schmitz, Bioinformatics Information Resources and Networks

  18. NCBI - Responsibilities • administers research on biomedical problems at the molecular level using mathematical and computational methods • maintains collaborations with several NIH (National Institutes of Health) institutes, academia, industry, and other governmental agencies • promotes scientific communication by sponsoring meetings, workshops, and lecture series • supports training on basic and applied research in computational biology for postdoctoral fellows through the NIH Intramural Research Program • engages members of the international scientific community in informatics research and training through the Scientific Visitors Program • develops, distributes, supports, and coordinates access to a variety of databases and software for the scientific and medical communities • develops and promotes standards for databases, data deposition and exchange, and biological nomenclature Ulf Schmitz, Bioinformatics Information Resources and Networks

  19. Nucleic Acid Sequence Databases • the principal nucleic acid sequence databases are GeneBank, • EMBL and DDBJ, which each collect a portion of the total sequence • data reported world-wide, and exchange new and updated entries • on a daily basis Ulf Schmitz, Bioinformatics Information Resources and Networks

  20. EMBL Ulf Schmitz, Bioinformatics Information Resources and Networks

  21. Nucleic Acid Sequence Databases - EMBL The EMBL Database (yesterday morning) containes 115,478,836,243 nucleotides in 63,713,453 entries. source: http://www3.ebi.ac.uk/Services/DBStats/ Ulf Schmitz, Bioinformatics Information Resources and Networks

  22. Nucleic Acid Sequence Databases - EMBL Total nucleotides (current 115,478,836,243) Number of entries(current 63,713,453) Ulf Schmitz, Bioinformatics Information Resources and Networks

  23. Nucleic Acid Sequence Databases - EMBL By nucleotide count Zea mays corn Other Gallus gallus rooster Homo sapiens human environmental sequence Danio rerio toy fish Bos taurus Canis familiaris dog breed Rattus norvegicus rat Pan troglodytes Wren (bird) Mus musculus mouse Ulf Schmitz, Bioinformatics Information Resources and Networks

  24. Nucleic Acid Sequence Databases – GenBank • GenBank which is produced at NCBI, is split into smaller, discrete divisions. • This facilitates fast, specific searches by restricting queries to perticular database subsets • During 1992-1997, the level of EST and STS data within GenBank grew 10-fold. • the overall sequence information contributed by such partial data was still less than that of higher quality sequences in the other major divisions Ulf Schmitz, Bioinformatics Information Resources and Networks

  25. Ulf Schmitz, Bioinformatics Information Resources and Networks

  26. Ulf Schmitz, Bioinformatics Information Resources and Networks

  27. Specialised Genomic Resources • In addition to the comprehensive DNA sequence DBs, there is a variety of more specialised genomic resources. • These so called boutique DBs bring focus to species-specific genomics and to particular sequencing techniques. Ulf Schmitz, Bioinformatics Information Resources and Networks

  28. Specialised Genomic Databases • SGDhttp://genome-www.stanford.edu/Saccharomyces(bakers yeast) • AceDBhttp://www.acedb.org(c.elegans) • FlyBasehttp://flybase.bio.indiana.edu(fruit fly) • MGDhttp://www.informatics.jax.org(Mouse) Ulf Schmitz, Bioinformatics Information Resources and Networks

  29. Protein Information Resources Levels of protein sequence and structural organisation: primary The primary structure of a protein is its amino acid sequence The second structure of a protein corresponds to regions of local regularity (e.g., α-helices and β-strands). secondary The tertiary structure of a protein arises from the packing of its secondary structure elements, which may form discrete domains within a fold. tertiary Ulf Schmitz, Bioinformatics Information Resources and Networks

  30. Protein Information Resources Levels of protein sequence and structural organisation: primary database sequence primary AVILDRYFH secondary database motif secondary [AS]-[IL]2-X[DE]-R-[FYW]2-H structure database tertiary domain module a,b,c @.*,# Ulf Schmitz, Bioinformatics Information Resources and Networks

  31. Primary Protein Databases • The primary structure of a protein is its amino acid sequence • these are stored in primary databases as linear alphabets that • denote the constituent residues Ulf Schmitz, Bioinformatics Information Resources and Networks

  32. Protein Sequence Databases Table of the most represented species • Swiss-Prot contains 197,228 sequence entries, comprising 71,501,181 amino acids abstracted from 135,257 references • Total number of species represented in Swiss-Prot: 9,520 • The average sequence length in Swiss-Prot is 362 amino acids. • Swiss-Prot is the most highly annotated protein sequence DB Ulf Schmitz, Bioinformatics Information Resources and Networks

  33. Composite Protein Sequence Databases • Composite databases amalgamate a variety of different primary databases • They render sequence searching much more efficient, because they obviate the need to interrogate multiple resources • Different composite databases use different primary sources and different redundancy criteria in their amalgamation procedures Ulf Schmitz, Bioinformatics Information Resources and Networks

  34. Composite Protein Sequence Databases Ulf Schmitz, Bioinformatics Information Resources and Networks

  35. Secondary databases • Secondary databases contain pattern data, i.e., diagnostic signatures for protein families. These signatures encode the most highly conserved features of multiply aligned sequences, which are often crucial to the structure or function of the protein • The second structure of a protein corresponds to regions of local regularity (e.g., α-helices and β-strands). • Which, in sequence alignments, are often apparent as well-conserved motifs • patterns are regular expressions, fingerprints, blocks, profiles, etc. Ulf Schmitz, Bioinformatics Information Resources and Networks

  36. Secondary databases Ulf Schmitz, Bioinformatics Information Resources and Networks

  37. Secondary databases • TRANSFAChttp://transfac.gbf.de • EPDhttp://www.epd.isb-sib.ch • InterPro http://www.ebi.ac.uk/interpro/ • PROSITEhttp://www.expasy.ch/prosite • BLOCKShttp://blocks.fhcrc.org • PRINTSftp://ftp.seqnet.dl.ac.uk/pub/database/prints • PFAM http://www.sanger.ac.uk/Software/Pfam/index.shtml • ProDomhttp://www.toulouse.inra.fr/prodom.html • InterProhttp://www.ebi.ac.uk/interpro • GeneCardshttp://bioinformatics.weizmann.ac.il/cards • ENSEMBLhttp://www.ensembl.org • EcoCychttp://ecocyc.panbio.com/ecocyc/ecocyc.html Ulf Schmitz, Bioinformatics Information Resources and Networks

  38. Secondary databases • There is some overlap in content between the secondary databases • PDBsum alone has 35,291 entries • Pattern DB growth is slow because the addition of detailed family annotation is very time consuming. • PROSITE and PRINTS are the only comprehensively, manually annotated secondary DBs • To address the annotation bottleneck, the secondary database curators are together created a unified database of protein families known as InterPro Ulf Schmitz, Bioinformatics Information Resources and Networks

  39. Structure Classification DBs • Contain 3D structures available from crystallographic and spectroscopic studies Ulf Schmitz, Bioinformatics Information Resources and Networks

  40. Structure Classification DBs • PDBhttp://www.rcsb.org • SCOPhttp://scop.mrc-lmb.cam.ac.uk/scop • CATHhttp://www.biochem.ucl.ac.uk/bsm/cath • DSSPhttp://www.sander.ebi.ac.uk/dssp • FSSPhttp://www.ebi.ac.uk/dali/fssp • HSSPhttp://www.sander.ebi.ac.uk/hssp Ulf Schmitz, Bioinformatics Information Resources and Networks

  41. Ulf Schmitz, Bioinformatics Information Resources and Networks

  42. Metabolic Databases • A number of metabolic databases are available electronically • some with features for querying and visualizing metabolic • pathways and regulatory networks. • KEGG(Kyoto Encyclopedia of Genes and Genomes) http://www.genome.ad.jp/kegg • ENZYME (Enzyme nomenclature database)http://www.expasy.ch/enzyme • BRENDA (Enzyme Information System)http://www.brenda.uni-koeln.de • EMP(Enzymes and Metabolic Pathways database)http://www.empproject.com Ulf Schmitz, Bioinformatics Information Resources and Networks

  43. Ulf Schmitz, Bioinformatics Information Resources and Networks

  44. Mapping Databases • OMIMhttp://www3.ncbi.nlm.nih.gov/omim • GDBhttp://www.gdb.org • RHDBhttp://corba.ebi.ac.uk/RHdb Ulf Schmitz, Bioinformatics Information Resources and Networks

  45. Ulf Schmitz, Bioinformatics Information Resources and Networks

  46. Ulf Schmitz, Bioinformatics Information Resources and Networks

  47. Databases concerning Mutations • dbSNPhttp://www.ncbi.nlm.nih.gov/SNP • HGBASEhttp://hgbase.cgr.ki.se • The SNP Consortium (TSC)http://snp.cshl.org • HAEMAhttp://europium.csc.mrc.ac.uk/usr/WWW/WebPages/database.dir/quiz.dir/intrquiz.htm Ulf Schmitz, Bioinformatics Information Resources and Networks

  48. LiteratureDatabases • PubMedhttp://www.ncbi.nlm.nih.gov/entrez/query • The Lancet http://www.thelancet.com • Bioinformatics Onlinehttp://www.bioinformatics.oupjournals.org • Naturehttp://www.nature.com • Sciencehttp://www.sciencemag.org Ulf Schmitz, Bioinformatics Information Resources and Networks

  49. Outlook – coming lecture • Introduction to sequence alignment • pair wise sequence alignment • The Dot Matrix • Dynamic Programming • Scoring Matrices • local alignment • Alignment tools • BLAST • FASTA • ALIGN Ulf Schmitz, Bioinformatics Information Resources and Networks

  50. The End Thanks for your attention! Ulf Schmitz, Bioinformatics Information Resources and Networks

More Related