1 / 16

Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain

Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain. Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering University of Texas at Arlington http://cygnus.uta.edu/subdue/. Motivation and Goal.

mspaulding
Download Presentation

Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying the SUBDUE Substructure Discovery System to the Chemical Toxicity Domain Ravindra N. Chittimoori, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering University of Texas at Arlington http://cygnus.uta.edu/subdue/ FLAIRS '99

  2. Motivation and Goal • Ever-increasing number of chemical compounds in use today (~100,000). • Needs to identify relationships between the molecular structure and the toxicity of a chemical compound. • Apply knowledge discovery to the U.S. National Toxicology Program (NTP) to identify such relationships. FLAIRS '99

  3. Knowledge Discovery in SUBDUE • Structural discovery system • Graph-based input representation • Beam search through substructure (subgraph) space • Graph compression heuristic based on minimum description length • Inexact, polynomial graph match FLAIRS '99

  4. Input Database Substructure S1 (graph form) Compressed Database T1 shape C1 C1 S1 R1 R1 triangle on shape S1 T2 T3 T4 object S2 S3 S4 square S1 S1 S1 object SUBDUE Example FLAIRS '99

  5. Chemical Toxicity Domain • Database of 367 chemicals • Levels of evidence assigned by NTP • CE: clear evidence of cancerous activity • SE: some evidence • E: equivocal evidence • NE: no evidence FLAIRS '99

  6. Predictive Toxicology Evaluation • Predictive Toxicology Evaluation (PTE) challenge • PTE-2 ended November 1998 • http://dir.niehs.nih.gov/dirlecm/pte2.htm • PTE-3 scheduled for July 1999 - July 2000 FLAIRS '99

  7. Chemical Toxicity Data • Atoms (name, type, partial charge) • Bonds (type) • Chemical groups • Alcohol, amine, amino, benzene, ester, ether, ketone, methanol, methyl, nitro, phenol and sulfide FLAIRS '99

  8. Chemical Toxicity Data • Carcinogenicity-related tests • Ames • Chromex • Chromaberr • Drosophilia • Mouse-Lymph • Salmonella Assay FLAIRS '99

  9. 32 10 10 0.064 0.062 0.063 h c c t p t p t p 1 n n n atom atom 7 atom p p p p AMES 7 p p 32 10 10 c 0.063 0.032 0.062 t c h t t p p p n 7 n n atom atom 1 atom Chemical Compound Representation FLAIRS '99

  10. Input Representation • Sample Atomic Structure • SUDBUE graph input C H 1 v 1 atom v 2 C v 3 atom v 4 H d 1 2 name d 3 4 name u 1 3 1 FLAIRS '99

  11. Methodology • Training set further divided into learning and testing sets • Find best substructures in learning-set positives not prevalent in negatives • Find occurrences of substructure in testing FLAIRS '99

  12. Results • Learning set: 268 • Positive compounds: 134/143 • Negative compounds: 24/125 • Testing set: 30 • Positive compounds: 15/19 • Negative compounds: 4/11 10 3 0.062 0.057 c br t p t p n n atom atom 1 FLAIRS '99

  13. Results • Learning set: 268 • Positive compounds: 60/143 • Negative compounds: 0/125 • Testing set: 30 • Positive compounds: 8/19 • Negative compounds: 0/11 1 10 32 0.34 0.211 h 0.778 t p c n t p t p 1 n atom 1 atom atom n n 1 0.36 h 1 t p n atom FLAIRS '99

  14. Discussion • Consistent with results obtained by ILP system PROGOL (Srinivasan et al., ILP-97). • Groups discovered by SUBDUE (e.g., Amino) are unique substructures found only in compounds which test positive on carcinogenicity. FLAIRS '99

  15. Conclusion • SUBDUE has the ability to discover interesting patterns (substructures) that might be helpful in predicting carcinogenicity. • SUBDUE is suitable for knowledge discovery in the chemical toxicity domain. FLAIRS '99

  16. Future Research • Applying concept-learning SUBDUE to the chemical toxicity database • Find substructures compressing positive graph, but not negative graph • Incorporate more domain knowledge • PTE-3 challenge (July 1999) FLAIRS '99

More Related