1 / 18

DNA

The Use of Graph Matching Algorithms to Identify Biochemical Substructures in Synthetic Chemical Compounds Application to Metabolomics. Mai Hamdalla , David Grant, Ion Mandoiu , Dennis Hill, Sanguthevar Rajasekaran and Reda Ammar University of Connecticut. Genome. DNA. Transcriptome. RNA.

jarah
Download Presentation

DNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Use of Graph Matching Algorithms to Identify Biochemical Substructures inSynthetic Chemical CompoundsApplication to Metabolomics Mai Hamdalla, David Grant, Ion Mandoiu, Dennis Hill, SanguthevarRajasekaran and RedaAmmarUniversity of Connecticut

  2. Genome DNA Transcriptome RNA Proteome Proteins Metabolome Lipids Sugars Amino Acids Nucleotides Metabolites Phenotype/Function

  3. N O O O O O O O O O O O O O O Identification Process • SMILES (simplified molecular-input line-entry system) • C8H7N C1=CC=C2C(=C1)C=CN2 • C9H18O8 C(C1C(C(C(C(O1)OCC(CO)O)O)O)O)O • C6H12O6 C(C1C(C(C(O1)(CO)O)O)O)O List of Candidate Chemical Structures MammalianMetabolite Identifier Ranked list of Candidate Structures with mammalian substructures

  4. List of Candidate Compound Structures Identification Process Mammalian Scaffolds List non-Biological Scaffolds Sugars Lipids Amino Acids Nucleotides Filtration Structure Matching List of Filtered Candidate Compounds Ranked list of identified Compounds

  5. Collection and Curation of Scaffolds Retrieve All compounds in a Metabolic Pathway in KEGG Database Keep Participants of Mammalian Metabolic Pathway Groups (91 KEGG Pathways) Carbohydrate, Energy, Lipid, Nucleotide, Amino Acid, Glycan, Cofactors, and Vitamins Metabolism Remove Entries that were single elements, metals, or inorganic Remove Compounds that did not have an entry in the PubChem Database. 1,987 compounds 30 – 1,000 da

  6. Identification Process List of Candidate Compound Structures Mammalian Scaffolds List non-Biological Scaffolds Sugars Lipids Amino Acids Nucleotides Filtration Structure Matching List of Filtered Candidate Compounds List of Identified Compounds

  7. O O N O O O O N N N O Structure Matching • SMSD (Small Molecule Sub-graph Detector) toolkit is used for molecule similarity searches. Where: NSBS : the number of atoms in the substructure and NSPR : the number of atoms in the superstructure.

  8. O O O O O O O O O N N N N N N N O O O O O O O N O O O O O O O O O O N O O N O N Scaffolds-Structure Matching Mammalian Scaffolds Candidate Structure 0.29 0.43 0.29 0.29 0.29 Similarity Score = 0.43 (6/14) Similarity Score = 0.43 (6/14) Similarity Score = 0.29 (4/14) Similarity Score = 0.43 (6/14) Similarity Score = 0.29 (4/14) Similarity Score = 0.29 (4/14) 0.43 0.36 C10H7NO3 C1=CC=C2C(=C1)C(=O)C=C(N2)C(=O)O

  9. O O O O O O O O O O N N N N N N N N O O O O O O O O N O O O O O O O O O O O N O O N O N Union Scaffold Structure Candidate Structure Mammalian Scaffolds 0.29 0.43 0.29 Similarity Score = 0.71 (10/14) 0.29 0.29 0.43 0.36 Union Scaffold

  10. N N N O N S S O O N O O N N O N O O O O Superstructure Scaffolds Matching 0.45 Union Scaffold Score = 0 Found to be a substructure of 38 Scaffolds! About 30% of the mammalian structures were missed (FN) Similarity Score = 0.9 0.9 (9/10) 0.6 (9/15) 0.75 (9/12)

  11. O N O O O N N O O O O O Scoring Methods Union Scaffold Structure Candidate Structure Superstructure Scaffold Structure • US: Union Scaffold Score = 0.71 • MS: Maximum Score (Union Scaffold Score, Superstructure Score) = 0.93 • SS: Sum of Scores (Union Scaffold Score, Superstructure Score) = 1.64 O 0.71 0.93

  12. Collection and Curation of Synthetic Compounds • Retrieve synthetic compounds from ChemBridge and ChemSynthesis databases. • restricted to the 6 biological elements C, H, N, O, P, and S. • The mass distribution • ChemBridge (150 – 700 da) • ChemSynthesis (50 –300 da) • 1,400 compounds were randomly selected for training and 5,320 compounds were randomly chosen for testing. mammalian scaffold list reduced to 1,400 compounds (50 – 700 da)

  13. Cross Validation Average Accuracy Results

  14. Leave one Out Accuracy Sensitivity = 96%

  15. Prospective Results of Synthetic Compounds 54% eliminated as non-mammalian

  16. Conclusions • A novel way of utilizing known mammalian metabolites (scaffolds database) to identify synthetic chemical compounds with mammalian substructures. • The results show a sensitivity of 96% in the mammalian scaffolds leave-one-out experiments. • The system was able to eliminate 54% of a random set of synthetic compounds.

  17. Ongoing Work • Exploring further improvements in accuracy by using known biological pathway information. • Annotating PubChem • Annotating existing and potential drugs • Database independent compound search • Generate all possible structures of a given formula and rank them

  18. O N O O O N N O O O O O Candidate Structures Mammalian Scaffolds List non-Biological Scaffolds Sugars Lipids Amino Acids Nucleotides Structure Matching Filtration List of Filtered Candidate Compounds O Thank you! Ranked Compounds

More Related