120 likes | 271 Views
Chemical Components Dictionary. Marina Zhuravleva, RCSB PDB June 16, 2011. Crystallographic Databases. PDB http://www.wwpdb.org/ macromolecular structural data (biopolymers) CCDC http://www.ccdc.cam.ac.uk small molecules database (organic, metal-organic) ICSD http://icsd.ill.eu/icsd/
E N D
Chemical Components Dictionary Marina Zhuravleva, RCSB PDB June 16, 2011
Crystallographic Databases • PDB http://www.wwpdb.org/ macromolecular structural data (biopolymers) • CCDC http://www.ccdc.cam.ac.uk small molecules database (organic, metal-organic) • ICSD http://icsd.ill.eu/icsd/ database of inorganic crystal structure data (intermetallics, metal oxides, etc)
Chemical Components Dictionary Content • In addition to biopolymers (proteins and nucleic acids), the PDB archive contains more than 10,000 unique non-biopolymer entities which are collectively called Chemical Components. • They are cataloged in the Chemical Components Dictionary (CCD). • These components are very diverse in nature and include ions, solvents, natural and modified amino nucleic and acids, ligands such as drugs, cofactors, metal clusters, surfactants, and others. • The goal is to provide a systematic, standard and common point of reference for the Components.
Some Examples 3H34 PpcE, A cytochrome c7 HEM PROTOPORPHYRIN IX CONTAINING FE (HEME)
Some Examples 3A39 3LO2 Neutrophil defensin 1 High-potential iron-sulfur protein MPD GOL
Properties of Chemical Components • Chemical Components are created as physically and chemically reasonable neutral stable entities wherein atom types and bond orders are properly defined and all valences are satisfied • Chemical Components might occur in many PDB entries. For example GOL appears in 6477 entries • New Components are added to CCD as new unique Chemical Componentsare found in deposited PDB entries. • The CC are unique and no Component contains another
File Content • Component Identifier: Unique alphanumerical 3 letter code (legacy Components may have 2 or 1 letter code) • Chemical and administrative data: Molecular weight, empirical formula, atom and bond counts, formal charge, dates of creation and modification, release status, processing site. • Atom names, bond connectivity and bond order: Atom pairs forming the bond and type of bond, e.g.: single, double, triple, etc. • Internal geometry: Coordinates are provided of both an experimental representative of the instances and also that of an idealized, computed model. This includes atomic Cartesian coordinates and internal geometry such as bond lengths, bond angles, and torsion angles. • Nomenclature and chemical identifiers: SMILES strings, InChI descriptors, IUPAC and common names, synonyms • Other information: flags, etc
Molecular formula search options • Exact all atom formula matches • Exact heavy atom formula matches • Exact formula will match any dictionary component containing the partial query formula. For instance, a query for C6 N2 will match any formula containing exactly six carbons and two nitrogens. • A formula subset query will find molecules with formula containing a minimum of the query formula composition. For instance, C6 H7 Hg N2 O2 S will match C8 H10 Hg N2 O4 S.
Molecular name search options: • Exact name matches. • Exact substring name matches. These search will find cases where a query name is contained with the dictionary name. For instance, pyridine will match 2-aminopyridine • The similar name option will math names which are lexigraphically similar but not exact matches. In other words the names may differ in a small number of characters. For instance, a search for pyridine will match uridine and pyrimidine.