1 / 17

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds. Mukund Deshpande, Michihiro Kuramochi , George Karypis University of Minnesota, Department of Computer Science/Army HPC Research Center Teacher : Dr.Ynag Student : Gun-Ren Wang Minneapolis, MN 55455

mira-sparks
Download Presentation

Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frequent Sub-Structure-Based Approaches for ClassifyingChemical Compounds Mukund Deshpande, Michihiro Kuramochi , George Karypis University of Minnesota, Department of Computer Science/Army HPC Research Center Teacher : Dr.Ynag Student : Gun-Ren Wang Minneapolis, MN 55455 Technical Report #03-016

  2. Outline • Introduction • Frequent Subgraph Based Classification Framework • Feature Generation Feature Generation • Feature Selection • Conclusion

  3. Introduction Any new drug should not only produce the desired response to the disease, but should do so with minimal side effects. • Evaluating this large set of compounds using HTS can be prohibitively expensive. • Not all biological assays can be converted to high throughput format. • Studying what part of the chemical compound leads to desirable behavior.

  4. Frequent Subgraph Based Classification Framework

  5. Feature Generation • In our classification algorithm we find the frequently occurring subgraphs using the FSG algorithm. • Topological sub-structures capture the connectivity of atoms in the chemical compound but they ignore the 3D shape of the sub-structures.

  6. Adjacency-list representation

  7. Canonical Labeling

  8. Candidate Joining

  9. Candidate Generation(1)

  10. Candidate Generation(2)

  11. Feature Selection For example,we have two ruleitems that have the same condset: <{(A, 1), (B, 1)}, (class: 1)>. <{(A, 1), (B, 1)}, (class: 2)>. Assume the support count of the condset is 3. (assume |D| = 10): (A, 1), (B, 1)(class, 1) [supt = 20%, confd= 66.7%] we only produce one PR(possible rule)

  12. The CBA-RG algorithm

  13. Building a Classifier • Definition: Given two rules, r and r < r (also called r precedes rj or ri has a higher precedence than rj) if • 1. the confidence of ri is greater than that of rj, or • 2. their confidences are the same, but the support of ri is greater than that of rj, or • 3.both the confidences and supports of ri and rj are the same, but ri is generated earlier than rj;

  14. A naïve algorithm for CBA-CB: M1

  15. Experimental Methodology & Metrics Table 1: The characteristics of the various datasets. N is the number of compounds in the database. ¯ NA and ¯ NB are the averagenumber of atoms and bonds in each compound. ¯ L A and ¯ L B are the average number of atom- and bond-types in each dataset.max NA/min NA and max NB/min NB are the maximum/minimum number of atoms and bonds over all the compounds in eachdataset.

  16. Varying Minimum Support

  17. Conclusion In this paper we presented a highly-effective algorithm for classifying chemical compounds based on frequent substructure discovery that can scale to large datasets.

More Related