180 likes | 322 Views
molecular biology database2. Databases at ExPASY. Prominent databases: SWISS-PROT PROSITE ENZYME SWISS-2DPAGE SWISS-3DIMAGE, SWISS-MODEL Repository. SWISS-PROT - Protein knowledgebase.
E N D
Databases at ExPASY Prominent databases: • SWISS-PROT • PROSITE • ENZYME • SWISS-2DPAGE • SWISS-3DIMAGE, SWISS-MODEL Repository
SWISS-PROT - Protein knowledgebase. • An annotated protein sequence database established in 1986 at the Department of Medical Biochemistry of the University of Geneva. • Now maintained at the Swiss Institute of Bioinformatics (SIB) and the European Bioinformatics Institute (EBI). • Minimal level of redundancy. • High level of integration with other databases (currently cross-referenced with about 45 different databases).
The database contains two types of classified data: • SWISS-PROT STANDARD (complete and up to the standards of annotation) • Core data (sequence data, bibliographical references, taxonomic data (biological source of protein). • Annotated data • Function(s) of the protein. • Posttranslational modification(s)(e.g., carbohydrates, phosphorylation, etc.). • Domains and sites (e.g.,calcium-binding regions, ATP-binding sites, zinc fingers, homeoboxes, etc.). • Secondary structure (e.g., alpha helix, beta sheet, etc.). • Quaternary structure(e.g., homodimer, heterotrimer, etc.). • Similarities to other proteins. • Disease(s) associated with any number of deficiencies in the protein. • Sequence conflicts, variants, etc. • PRELIMINARY Sequence entries are distributed with a supplement called TrEMBL (Translations of EMBL); • Computer-annotated supplement to SWISS-PROT. • Sequences which have not yet been annotated by the SWISS-PROT staff up to the standards of annotation.
Searching SWISS-PROT Access to SWISS-PROT, TrEMBL and other databases using the • Quick search (by AC, ID, description, gene name, organism. NO boolean operators). • SRS - Sequence Retrieval System. Other search options for SwissProt: • Full text search in SWISS-PROT and TrEMBL. • Advanced search in SWISS-PROT and TrEMBL by description, gene name and organism (can be used to create html links to SWISS-PROT/TrEMBL queries). • Taxonomy browser. • Search by accession number or ID (AC or ID line; SWISS-PROT and TrEMBL). • Search by description or identification (any word in the DE, OS, OG, GN and ID lines; SWISS-PROT and TrEMBL). • Search by author (RA line; SWISS-PROT and TrEMBL). • Search by citation (RL line; SWISS-PROT only).
PROSITE - Protein families and domains • Database of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs. • Based on the observation that proteins can be grouped on the basis of similarities in their sequences (signature for a protein family or domain). • The protein signatures are provided in PROSITE format. This format can also be used to do similarity searching by using PHI-BLAST/NCBI (see module "Similarity Searching). • Currently contains patterns and profiles specific for >1000 protein families or domains. • List of PROSITE entries. • Background information for each of these protein signatures is provided.
Patterns - Small regions with high sequence similarity. • Alignment table from a group or family of proteins. • Find biologically significant regions or residues such as: • Enzyme catalytic sites. • Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin, etc). • Amino acids involved in binding a metal ion. • Cysteines involved in disulfide bonds. • Regions involved in binding a molecule (ADP/ATP, GDP/GTP, calcium, DNA, etc.) or another protein. • The pattern(s) created at this stage is the 'core' pattern(s). • SWISS-PROT knowledgebase is then scanned with these core pattern(s).
Profiles (or weight matrixes) - Protein family or domain over its entire length. • A table of position-specific amino acid weights and gap costs (scores) derived from a multiple sequence alignment and a symbol comparison table to convert residue frequency distributions into weights. • Scores are used to calculate a similarity score for any alignment between a profile and a sequence. • Example of a profile (matrix) entry.
Searching PROSITE • Access to PROSITE using • Quick search by AC, ID, or description. (NO boolean operators). • Browse PROSITE documentation entries for • Post-translational modifications, Domains, DNA or RNA associated proteins, Enzymes, Electron transport proteins, Other transport proteins, Structural proteins, Receptors, Cytokines and growth factors, Hormones and active peptides, Toxins, Inhibitors, Protein secretion and chaperones and others. • SRS - Sequence Retrieval System. • Search by author. • Search by citation. • Search by full text search.
Similarity searching in PROSITE: • ScanProsite - Scan a sequence against PROSITE or a pattern against SWISS-PROT or PDB and visualize matches on structures. • Scan a protein for PROSITE matches (Enter a SWISS-PROT/TrEMBL accession number (AC), a sequence identifier (ID), a PDB identifier, or paste your own protein sequence in the box). • Search SWISS-PROT with a PROSITE entry (Enter a PROSITE accession number, or type your pattern in PROSITE format. You can also scan a sequence with the entire PROSITE database).
PDB - The Protein Data Bank • PDB (Protein DataBank) is an international database of 3-D biological macromolecular structures. It is maintained by a nonprofit organization, the Research Collaboratory for Structural Bioinformatics (RCSB), associated with Rutgers University, San Diego Supercomputer Center, and the Biotechnology Division of the National Institute of Standards and Technology. There are multiple mirror sites worldwide. This is a public free-access database that contains molecular structures, proteins and nucleic acids, primarily structures experimentally-derived by X-Ray crystallography and NMR. Data submitted to PDB is validated prior to complete entry.
PDB Search • The PDB database offers a Java-based advanced search facility. The browser must be Java-enabled for it to function.
Differences Between MMDB & PDB • MMDB is interlinked with Entrez; PDB is not. • MMDB uses the ASN.1 format which contains value-added information such as explicit chemical bond information, secondary structure information, and links to associated records in PubMed, Taxonomy, Protein, Nucleotides, CDD, 3D domains, PubChem. • PDB supports several data formats. Because it handles many different data formats, PDB suffers from a 'history' problem - inconsistencies in how the data formats are constructed over time. Work is being done to make the data more uniform. • MMDB does not allow direct submissions. All structures are entered into PDB first and then are automatically drawn into MMDB on a monthly basis. PDB updates weekly. Therefore, there may be newer structures in PDB that have not yet been entered into MMDB.