180 likes | 283 Views
Dictionaries and Ontologies in Structural Biology. Scope of Ontology PDB Exchange Dictionary. Meta Data Experimental information Molecular description Structural description Coordinates Macromolecule Ligands Solvent. History of Project. 1990 mmCIF project begins
E N D
Scope of OntologyPDB Exchange Dictionary Meta Data • Experimental information • Molecular description • Structural description Coordinates • Macromolecule • Ligands • Solvent
History of Project 1990 mmCIF project begins 1992 NDB serves as testbed 1998 PDB adopts mmCIF as core data representation 2001 PDB Exchange Dictionary incorporates X-ray, NMR and cryoEM 2003 direct translation of mmCIF data & dictionaries into XML(PDBML)
Challenges in Creating an Ontology • Appropriate coverage and level of detail • Acquiring and organizing expert input • Getting consensus • Evolution with the science • Create a rigorous syntax that can be translated (eg mmCIF ->XML)
mmCIF (PDB Exchange)is an Ontology Relationships among data items are explicit
Features of Dictionary • Data Items • Definitions • Examples • Data types • Ranges or enumerations • Simple organization • Tables and columns (categories) • Related data item sets (subcategories) • Chapters (category groups) • Associations • Parent-child relationships • Interdependencies/exclusivity • Methods
Dictionary Definition Example save__em_detector.type _item_description.description ; The detector type used for recording images. Usually film or CCD camera. ; _item.name '_em_detector.type' _item.category_id em_detector _item.mandatory_code no _item_type.code line loop_ _item_enumeration.value 'KODAK SO163 FILM' 'GATAN 673' 'GATAN 676' ’TVIPS TEMCAM F224' 'TVIPS FASTSCAN F114' PROSCAN AMT save_ Semantics Schema Data type Controlled vocabulary
Dictionary Definition Example Semantics save__struct_biol.id _item_description.description ; The value of _struct_biol.id must uniquely identify a record in the STRUCT_BIOL list. Note that this item need not be a number; it can be any unique identifier. ; _item.name '_struct_biol.id' _item.category_id struct_biol _item.mandatory_code yes _item_type.code line loop_ _item_linked.child_name _item_linked.parent_name '_struct_biol_gen.biol_id' '_struct_biol.id' '_struct_biol_keywords.biol_id' '_struct_biol.id' '_struct_biol_view.biol_id' '_struct_biol.id' '_struct_ref.biol_id' '_struct_biol.id' save_ Schema Data type Parent-child (foreign key) relationships
Molecular Description • Macromolecular sequence • Macromolecular source • Detailed chemical descriptions of monomers • Detailed chemical descriptions of ligands and solvent
Molecular Hierarchy Biological Source Macromolecular Polymer Sequence Molecular Component Dictionary Molecular Description Non-polymer Chemical Details
Structural Description • Coordinates of the experimental subunit • Symmetry operations required to build functional assemblies • Structural annotation • Secondary structure • Hydrogen bonding classification • Base pairs and base pair steps • Backbone torsions and base morphology
Structural Hierarchy Molecular Description Functional Units Experimental Subunits Secondary Structure Hydrogen Bonding Atomic Coordinates Base Pairs Base Pair Steps Backbone Torsions Base Morphology
Connection between Molecular and Structure Descriptions • Macromolecular sequences are explicitly aligned to experimentally determined chemical sequences • Monomers, ligands and solvent matched with chemical descriptions in the PDB molecular components dictionary Molecular Description Structural Description
Relationships with other Resources • Sequence database correspondences • Domain/family annotation • Functional annotation (GO/EC/OMIM) • Structural database correspondences • SCOP/CATH/RNAML structural classifications • Functional annotation • Citation and related literature
Supporting Software ToolsDictionaries, Data Files and Databases • Validating Parsers for Files and Dictionaries (CIFPARSE) • Dictionary access and presentation tools (CIFOBJ) • File format translation tools (MAXIT, CIFTr) • PDB Validation Suite • Data acquisition and editor tool (ADIT) • Database Builder, Loader (mmCIFLOADER) • XML translation tool • Data extraction and merging tools (PDB_EXTRACT)
Availabilityhttp://sw-tools.pdb.org/ • WWW and CDROM Distribution • Source and Binary Distributions • Open Source License • Supported on Linux, IRIX, ALPHA, SUNOS, and Mac OSX
Structure Related Data Dictionaries • DDL2 • mmCIF • RNAML • Ligand data • NMR • Cryo-EM • Modeling • Crystallization • Symmetry • Image data • BIOSYNc • Protein Production
Access • RCSB Protein Data Bank Site http://www.pdb.org/ • RCSB/PDB Beta Data Site http://pdbbeta.rcsb.org/ • RCSB/PDB Dictionary Resource Site http://mmcif.pdb.org / • RCSB/PDB Deposition Site http://deposit.pdb.org / • PDBML site http://pdbml.pdb.org/ • RCSB/PDB Software Download Site http://sw-tools.pdb.org /