1 / 39

Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study

Colin Batchelor , Ken Karapetyan, Valery Tkachenko, Antony Williams. Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study. 6th Joint Sheffield Conference on Chemoinformatics 2013-07-24. Overview.

reyna
Download Presentation

Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Colin Batchelor, Ken Karapetyan, Valery Tkachenko, Antony Williams Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study 6th Joint Sheffield Conference on Chemoinformatics 2013-07-24

  2. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

  3. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

  4. Who is involved? 28 Consortium Members >45 Associated Partners • 3-year European project funded by: • European Pharmaceutical Industry • Innovative Medicines Initiative • Applications using the Open PHACTS API Explorer • Open PHACTS API dev.openphacts.org • www.openphacts.org Twitter: @open_phacts

  5. How do we fit in? We integrate and standardize the chemical compound collection underpinning Open PHACTS and provide regular updates and on-going data curation. The validation and standardization rules have been derived from the FDA structure guidelines and have been changed for consistency and input from members of EFPIA.

  6. Open PHACTS provides an integrated platform of publicly available pharmacological and physicochemical data ” • Data accessible via: • Free application programming interface (API) • dev.openphacts.org • Third-party applications built to use the API • Open PHACTS app ecosystem

  7. How does Open PHACTS work?

  8. Currently integrated databases

  9. CVSP and the OPS CRS Standardization workflows (CVSP, FDA, OPS, custom) using modules such as: SMIRKS transformations layout (GGA) canonical tautomers(ChemAxon) sugar interpretation (RSC)

  10. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

  11. RDF and Open PHACTS The underlying language of Open PHACTS is RDF. There are few constraints as such, only guidelines for which classes of identifier to use and accounts of best practice. This RDF goes into the data cache and we access the results through user interfaces built on RESTful JSON web services.

  12. What does RDF look like? In the Turtle format below, each line is a triple, in which a binary predicate links a subject and an object. :CSID1execution obo:OBO_0000299 :CSID1prop11 . :CSID1prop11 obo:IAO_0000136 ops:OPS1 . :CSID1prop11 rdf:type cheminf:CHEMINF_000349 . :CSID1prop11 qudt:numericValue "1.049E-17"^^xsd:double . :CSID1prop11 qudt:unit obo:UO_0000324 . There is also RDF/XML, which is less human-readable.

  13. Royal Society of Chemistry data in Open PHACTS Molecule synonyms and identifiers Linksetsbetween ChEBI, ChEMBL, DrugBank and OPS identifiers Molecule–molecule relations (“parent–child”) of interest for drug discovery Calculated physicochemical properties for compounds (both molecular and macroscopic)

  14. Royal Society of Chemistry data in Open PHACTS Molecule synonyms and identifiers Linksetsbetween ChEBI, ChEMBL, DrugBank and OPS identifiers Molecule–molecule relations (“parent–child”) of interest for drug discovery Calculated physicochemical properties for compounds (both molecular and macroscopic)

  15. Calculated physicochemical properties (ACD 12.0) log P log D (at pH 5.5, at pH 7.4) bioconcentration factor KOC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizabilitysurface tension density at STP boiling point at 1 atmflash point at 1 atmenthalpy of vaporization at STP vapour pressure at STP

  16. RDF for calculated properties:vocabularies Two dozen calculated properties for each of >106 molecules. CHEMINF ontology for kinds of calculation and chemical data QUDT for results OPS IDs for molecules OBI and IAO to connect calculations to results

  17. RDF for calculated properties:schema CHEMINF calculated log P calculation process OBI has specified output rdf:type CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type calculation result OBI has specified input QUDT has value IAO is about OPS benzene benzene’s connection table QUDT has unit “2.17”^^xsd:float QUDT has standard uncertainty rdf:type QUDT dimensionless quantity CHEMINF connection table “0.234”^^xsd:float

  18. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

  19. ChEMBL and DrugBank analysed Taking ChEMBL 16(http://www.ebi.ac.uk/chembl/) which contains 1 295 510 distinct molecules, CVSP found something to say about 456 250of them (35%). DrugBank 3.0 (http://www.drugbank.ca/) contains 6510 distinct molecules of which CVSP has found something to say about 662 of them (10%) (We haven’t done all of CS yet; we will.)

  20. Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL Sugar case study: Perspective perception

  21. Sugar depiction challenges Stereochemistry not stored in V2000 format (though present in .cdx).

  22. Consequences

  23. Sugar ring redepiction algorithm Identify perspective conformation (boat, chair, Haworth) Determine perspective stereo Assign wedge or hash to bonds accordingly Reconstruct sugar ring so as to minimize disruption to the rest of molecule Tidy

  24. Take the x-axis as parallel to the line through the top two chair atoms or through the bottom two chair atoms. Δy positive: wedge Δy negative: hash Then remap chair to homotropous hexagon.

  25. In the boat case, the substituent further up the page is the wedge, while the one further down the page is the hash, regardless of whether bridgehead or not.

  26. Depiction Identify mean bond length and chair centroid. Snap ring atoms to a regular-hexagonal grid. Remove superfluous hydrogen atoms. Only mark stereo on a single substituent if they are paired (cf. Grice).

  27. Tidying: desiderata Different problem from structure layout in general. The structure we end up with is, in many important respects, fine. Preserve drawing conventions—aglycones being on the top right hand side.

  28. Next steps Stable user-facing URI for CVSP (currently http://cvsp.beta.rsc-us.org/, but subject to change) Apply CVSP to all of ChemSpider. Investigate fused rings.

  29. Acknowledgements In particular, Jon Steele (RSC) David Sharpe (RSC) John Blunt (Canterbury, NZ)

  30. batchelorc@rsc.org @documentvector Any questions?

More Related