350 likes | 565 Views
Biological databases. International genome sequencing and protein structure determination. Protein Data Bank (PDB). Sequence data = strings of letters. Nucleotides (bases) Adenine ( A ) Cytosine ( C ) Guanine ( G ) Thymine ( T ). triplet codons genetic code.
E N D
International genome sequencing and protein structure determination Protein Data Bank (PDB)
Sequence data = strings of letters Nucleotides (bases) Adenine (A) Cytosine (C) Guanine (G) Thymine (T) triplet codons genetic code 20 amino acids (A, L, V, S etc.)
Three-dimensional protein structure = atomic coordinates in 3D space Conversion into metric
Data types primary data sequence DNA amino acid primary database DMPVERILEALAVE… secondary data secondary protein structure secondary db “motifs”:regular expressions, blocks, profiles, fingerprints tertiary data tertiary protein structure tertiary db atomic co-ordinates e. g., alpha-helices, beta-strands interaction data pathways and functional networks interaction db binary protein-protein interactions/ networks domains, folding units
Nucleic acid EMBL GenBank DDBJ (DNA Data Bank of Japan) Protein PIR MIPS SWISS-PROT TrEMBL NRL-3D Primary biological databases
International nucleotide data banks EMBL Europe GenBank USA International Advisory Meeting Collaborative Meeting NLM EMBL NCBI EBI DDBJ Japan TrEMBL NRDB NIG CIB
Other primary protein databases • TrEMBL (translated EMBL) in SWISS-PROT format rapid access to sequence data from genome projects computer-annotated supplement to SWISS-PROT translations of all coding sequences (CDS) in EMBL • SP-TrEMBL
Other primary protein databases The Protein Information Resource (PIR) • integrated system of protein sequence databases and derived related databases, e. g., alignment databases • rapid searching, comparison, and pattern matching of protein sequences • retrieval of descriptive, bibliographic, feature, and concurrent cross-reference information • aims to be comprehensive and consistently annotated
PIR: related databases NRL-3D Sequence-Structure Database • produced by PIR from sequence and annotation information extracted from three-dimensional structures in the Protein Databank (PDB) • allows keyword and similarity searches
Two other useful sites INFOBIOGEN-The Public Catalog of Databases http://www.infobiogen.fr/services/dbcat/ KEGG-Kyoto Encyclopedia of Genes and Genomes http://www.genome.ad.jp/kegg/ Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize current knowledge of molecular and cellular biology in terms of the information pathways that consist of interacting molecules or genes and to provide links from the gene catalogs produced by genome sequencing projects.
Sequence Retrieval System (SRS) • Database browser that allows users to • retrieve • link • access • entries from all interconnected resources. • Users can formulate queries across a range of different database types.
Guide to Protein Databases: http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html With thanks to Dr Roman Laskowski.
Biomolecule-ligand interactions • SRS: Enzymes, reactions and metabolic pathway databases • Receptor-ligand database searches relibase.ebi.ac.uk/
Interaction databases Yeast model • YPD - http://www.incyte.com/sequence/proteome • proteome database of model organism • 6142 proteins : 3430 known, 804 similarity, 1908 unknown • data on protein interaction maps • derived from literature and experiment • Curagen - http://curatools.curagen.com • Curagen -Yeast two-hybrid screen data • 957 putative interactions of 1004 yeast proteins • Uetz et al., 2000 - Nature 403 p623-630
Protein-Protein Interaction Databases http://www.hgmp.mrc.ac.uk/GenomeWeb/prot-interaction.html
Protein-Protein Interactions DIP Biocarta KEGG
KEGG http://www.genome.ad.jp/kegg/ • Search database for metabolic and regulatory pathways • Compute KEGG: Generate possible reaction pathways between two compounds http://www.genome.ad.jp/
Metabolic pathways Signal transduction pathways (species-specific, Homo sapiens shown)
Biocarta pathway database http://www.biocarta.com