330 likes | 483 Views
Protein Database in Europe. Deposition , Validation, Search and Analysis Services. Gaurav Sahni, Ph.D. worldwide Protein Data Bank (wwPDB). Consists of four sites RCSB (USA), PDB-j (Japan) BMRB (USA) and PDBe. Single repository of macromolecular structures.
E N D
Protein Database in Europe Deposition , Validation, Search and Analysis Services Gaurav Sahni, Ph.D.
worldwide Protein Data Bank (wwPDB) • Consists of four sites • RCSB (USA), PDB-j (Japan) BMRB (USA) and PDBe. • Single repository of macromolecular structures. • Started in 1971 and now ~61,000 entries, adding ~200 new entries/week. • Deposited by experimentalists and contents is freely available. • The format of the archive is flat-files with fixed line format, although an improved flat-file format (mmCIF) and XML are also available.
Protein Databank in Europe (PDBe) group • Is one of the four sites around the world that where 3D structures may be deposited. • Provides stable and clean repository of macromolecular structure data. • Has services that allow users to access, search and retrieve structural data from a single web access point.
PDBe Tasks Deposition and Validation Database design and implementation Retrieve data Analysis tools & Services
Depositions and Curation Deposition via AutoDep4 (http://www.ebi.ac.uk/pdbe-xdep/autodep) Closely collaborate with the other wwPDB members for a single unified archive.. Depositions via EMDEP (http://www.ebi.ac.uk/pdbe-emdep/emdep) Depositions started June 2002
Validation of Structures • Authentication of source That the protein is from human and not rabbit, for example ! • Authentication of structure Comparison of structure against raw data. Geometry and Stereochemistry. Provide results back to depositor. • Validation of correct methodology used Whether X-Ray, NMR or EM. • Conformity to standards Follows PDB format specifications • Error checks • Consistency checks - to identify simple typos Homo sapiens and not Homo sapien (single human?). • Outlier detection - to identify suspect records
PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services
Disadvantages of Flat files… • Macromolecular structures are very complex. • Existing PDB format is incapable of fully describing few existing structures also. • Format is not readily extensible, to cope, for example, with structural genomics data. • Historical archive is non-uniform and poorly populated. • Search and retrieval of flat files is difficult and/or inaccurate.
Uniform Data Improved Query Functionality PDBe Relational Database Crystallographers Biologists Time Effort Usefulness Usage Programmers Bioinformaticians
PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services
Some Implementation Issues • The PDBe database is large and complex: • ~61,000 PDB entries • Cross-referenced against SwissProt, PubMed etc. • Making data accessible without adding additional complexity. • Tools for different categories of end-user • Simple – biobar • Intermediate - PDBelite • Advanced – PDBepro • New - PDBeView
biobar A toolbar search application for Mozilla/Netscape or firefox browsers http://biobar.mozdev.org/ Simple and quick retrieval of data from PDBe and 45 other Databases
PDBelite A simple form-based query system to search the PDBe Databases
Features of Search Interface • Strengths: • simple, easy to use form • allows multiple search fields to be combined • relatively fast, despite performing quite complex SQL queries • Weaknesses: • not exposing the power of a relational database • limited logical operators between search fields: • "name" AND "title" AND "keyword“ • "name" OR "title" OR "keyword“ • ( "name" OR "title" ) AND NOT "keyword" • the search form is defined by the authors of the search system, not the author of a query
PDBepro A java-based flexible graphical search interface for advanced searching
Complex searches • User have comprehensive control of their query • Applet provide a dynamic form, as compared to a static HTML form: • choose the fields to be searched • specify the relationships between search fields • choose the result fields and how results are presented • perform “complex” sub-queries e.g. SSM, FASTA • PDBepro uses an applet for constructing queries and a server to execute them • The user describes their query entirely graphically, including logical operations such as AND, OR and NOT
PDBe Tasks Deposition site Database design and implementation Retrieve data Analysis tools & Services
AstexViewer™: Visualization@PDBe • View structures as wireframe, backbone or ribbons • Built-in sequence viewer • Calculate and display surfaces • Various display options: • Ramachandran plots • Distance matrix • B-factors Based on the AstexViewer™ from Astex Technology Limited and modified under licence by the PDBe group
PDBeChem Ligand Database
PDBeSite What is the environment aroundalpha-D-mannoseandbeta-D-mannose?
PDBeSite What binds ASP ASP HIS LYS ?
PDBeSite How does ATP generally interact with LYS in all structures ?
PDBeAnalysis Assess Quality of a Structure Bond Distances Bond Angles Ramachandran Plot
PDBePisa What assembly can my structure have ?
PDBeFold Discover unknown relationships… • Are there any structures in the PDB that are similar to mine? • What SCOP and/or CATH family could my structure belong to ? • Can I get some idea about the possible function of my protein based on similarity with others based on structural similarity ? • Mutiple alignment of many of my structures ?
ChemSearch Sub-structure based search of a million chemicals
PDBeAnalysis/PDBeValidate Online PDB validation
PDBeStatus PDB Deposition status search
PDBe provides… • Clean biological data • Integrated data • A single web access point • Query interfaces for different users (Beginner, Occasional or expert). • Interconnected views of the data relating structure, sequence, text & experimental details.
Linking to Domain data, eFamily Sequence Mapping, SIFTS PDBechem ligand data Electron Density Visualisation AstexViewer PDBePro, PDBelite PISA biological assemblies Active sites Fold matching Surface Matching