1 / 16

Predicting patterns of biological performance using chemical substructure features

Predicting patterns of biological performance using chemical substructure features. Diego Borges-Rivera 08/04/08. 10111010001010101000101101. 01010001011011. Introduction. cheminformatics – allow us to computationally describe similarity

alaura
Download Presentation

Predicting patterns of biological performance using chemical substructure features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08

  2. 10111010001010101000101101 01010001011011 Introduction • cheminformatics – allow us to computationally describe similarity • synthetic chemists – describe through visual inspection • we will describe compounds by the presence of chemical substructures • we will attempt to identify sets of substructures that predict biological performance

  3. 10 20 30 40 50 60 substructures Previous work • Clemons/Kahne/Wagneret al. -- disaccharide profiling in multiple cell states • found sets of substructures relevant to biological activity patterns • substructures highly specific to disaccharides

  4. Biological performance profile • 400 compounds, 8 assays in duplicate • tested for cell proliferation in 8 different cell lines • class labels are active (A) or inactive (I) active compound

  5. What are fingerprints? • compound collection fed into commercial software • each substructure = 1 bit • the fingerprint shows which substructures are present substructure #7017 substructure #886 substructure #1725

  6. Overview of cheminformatic methods • produced fingerprints  7700 total substructures • filtered set • left 2166 substructures

  7. Overview of computational methods • two steps independent of each other feature (substructure) selection to find predictive subsets evaluate methods for predictive value

  8. ReliefF: substructure selection Top 5 -1 0 +1 2166 weights Bottom 5

  9. compound being classified = ? K nearest neighbors (knn): predictive accuracy • Examples: k = 2, 5

  10. Similarity between compounds • similarity between two fingerprints • Tanimoto coefficient • this is used twice: • in ReliefF • in knn Example: Compound a: 0 0 1 Compound b: 1 0 1 Tanimoto coefficient = 1 / 2 = .5

  11. test set training set Cross-validation: predictive accuracy • 10 subsets • test set: one of the subsets • training set: the remaining subsets

  12. Picking parameters for methods • which parameters produce the best predictive accuracies • number of neighbors used in ReliefF {1, 2, 4, etc} • number of neighbors used in knn {1, 2, 4, etc} • number of ReliefF substructures used to predict classes in knn {1, 20, 100, etc}

  13. 1.0 .9 .8 .7 .6 .5 .4 .3 .2 .1 0.0 predictive accuracy 1 20 all number of substructures used to predict Picking number of substructures

  14. Group of substructures best able to predict

  15. Future work • multi-class • different feature selection

  16. Acknowledgements Computational Chemical Biology Joshua Gilbert Paul Clemons Hyman Carrinski Summer Research Program in GenomicsShawna Young Lucia Vielma Maura Silverstein

More Related