1 / 24

Interactive Datamining of Large-Scale Screening Datasets

This overview discusses the challenges and solutions for interactive data mining and visualization of large-scale screening datasets. It covers information visualization techniques, examples from ChemCodes Inc. and NCI, and includes a demo.

jtamez
Download Presentation

Interactive Datamining of Large-Scale Screening Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interactive Datamining of Large-Scale Screening Datasets Frank Oellien, Wolf D. Ihlenfeldt Computer-Chemie-Centrum University Erlangen-Nuremberg Klaus Engel, Thomas ErtlVisualization and Interactive Systems Group University Stuttgart

  2. Overview • Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

  3. Overview • Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

  4. Chemical data 18000000 16000000 Merck Katalog Synopsys PG 14000000 ACX 12000000 NCI DTP 10000000 ChemInform 8000000 Spresi 6000000 Beilstein 4000000 CAS Current datasets 2000000 0

  5. Multi-Variate and Multi-Dimensional Numeric Datasets Today • Change in chemical synthesis technology • new technologies (HTS, combinatorial synthesis) • experiments generate terabytes of data per year • development of data mining and visualization tools could not keep pace • most critical bottleneck in R&D today ! • tools for interactive mining and information visualization are needed

  6. Tools for Interactive Visualization of Multi-Variate and Multi-Dimensional Data • Standard applications • barchart, 2D and pseudo 3D scatter plots, molecular spreadsheets • limited to small subsets • platform-dependent • Our goal: applications that are • simple to use • allow straightforward interpretation of results • generalized access to tabular numeric data • platform-independent

  7. Overview • Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

  8. 3D Tools for Interactive Information Visualization • Information Visualization Applications that uses 3D capabilities of modern clients • Glyph-based InfVis approaches • Volume-based InfVis approaches

  9. Glyph-based InfVis Tools • 3 orthogonal axes • color • shape • size • transparency • surface effects • animation • up to ~100 Glyphs

  10. Java/Java3D InfVis Applet • Java3DCanvas • Tool Panel • (filters, selection tools, details) • ControlPanel

  11. Java/Java3D InfVis Applet3D Render Panel • 3D Glyphs • 3D Barchart

  12. Dynamic Filter Tools • Selection Tools • Detail Tools Java/Java3D InfVis Applet3D Tool Panel

  13. Java/Java3D InfVis Applet3D Control Panel

  14. Advantages of Volume-based InfVis Tools • Databases with millions of data points • Glyph-based InfVis approaches • produce millions of geometricprimitives • interactive visualization not possible • Volume-based InfVis approaches • can handle large number of data points • interactive visualization using low-cost graphics hardware is possible

  15. Overview • Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

  16. ChemCodes Reaction Database • 100 most important FGs ~75% chemistry • 100 standard reactions • Limits of standard reactions • Functional Group Compatibility • Generating Rules • Goal: Analysis of the reaction space

  17. ChemCodes - Reaction Optimization I • Goal: Reaction Optimization: > 95% Yield • 7 Dimensions:reagent, solvent, time, temperature,stoichiometry,reagent order,FG-compatibility

  18. ChemCodes - Reaction Optimization II

  19. ChemCodes - Reaction Planning • FunctionalGroupCompatibilityCheck

  20. Example 2: NCI Anti-tumor / Anti-viral Database • Initiated in April 1990 (modified 1994) • ~ 250.000 compounds • ~ 30.000 with anti-tumor screening data • Enhanced NCI Database Browser • > 30 different molecular properties • up to 23 3D conformers per compound

  21. Lead Compound Discovery II

  22. Lead Compound Discovery II

  23. Overview • Multi-variate and multi-dimensional datasets • Motivation • Information Visualization Techniques • Examples (ChemCodes Inc., NCI) • Demo

  24. Acknowledgment • Prof. Johann GasteigerComputer-Chemie-CentrumUniversity of Erlangen-Nuremberg • Prof. Thomas Ertl, Dipl. Inf. Klaus Engel Visualization and interactive SystemsUniversity of Stuttgart • Dr. Patrick Kiser, Dr. Gary Eichenbaum ChemCodes Inc. • Marc NicklausLaboratory of Medicinal ChemistryNCI, NIH • Deutsche Forschungsgemeinschaft

More Related