80 likes | 191 Views
Overview of Chemical Informatics and Cyberinfrastructure Collaboratory. Aug 16 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org http://www.chembiogrid.org.
E N D
Overview of Chemical Informatics and Cyberinfrastructure Collaboratory Aug 16 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org http://www.chembiogrid.org
Capabilities • Local Teams, successful Prototypes and International Collaboration set up in 3 initial major focus areas • Chemical Informatics Cyberinfrastructure/Grids with services, workflows and demonstration uses building on success in other applications (LEAD) and showing distributed integration of academic and commercial tools • Computational Chemistry Cyberinfrastructure/Grids with simulation, databases and TeraGrid use • Education with courses and degrees • Review of activities suggest we also formalize work in two further areas • Chemical Informatics Research – model applicability • Interfacing with the User - bench chemist-friendly portal
Current Status • Web site http://www.chembiogrid.org • Wiki chosen to support project as a shared editable web space • Building Collaboratory involving PubChem – Global Information System accessible anywhere and at any time – enhance PubChem with distributed tools (clustering, simulation, annotation etc.) and data • Adopted Taverna as workflow as popular in Bioinformatics but we will evaluate other systems such as GPEL from LEAD • Preparing large set of runs on local Big Red 23 Teraflop supercomputer (OSCAR3 CDK Mopac) • Initial results discussed at conferences/workshops/papers • Gordon Conferences, ACS, SDSC tutorial • First new Cheminformatics courses offered • Advisory board set up and met • Videoconferencing-based meetings with Peter Murray-Rust and group at Cambridge roughly every 2-3 weeks • Good or potentially good interactions with NIH DTP, Scripps, Lilly and Michigan ECCR
CICC Senior Personnel • Peter T. Cherbas • Mehmet M. Dalkilic • Charles H. Davis • A. Keith Dunker • Kelsey M. Forsythe • Kevin E. Gilbert • John C. Huffman • Malika Mahoui • Daniel J. Mindiola • Santiago D. Schnell • William Scott • Craig A. Stewart • David R. Williams • Geoffrey C. Fox • Mu-Hyun (Mookie) Baik • Dennis B. Gannon • Marlon Pierce • Beth A. Plale • Gary D. Wiggins • David J. Wild • Yuqing (Melanie) Wu From Biology, Chemistry, Computer Science, Informatics at IU Bloomington and IUPUI (Indianapolis)
CICC Advisory Board • Alan D. Palkowitz (Eli Lilly) • Chris Peterson (Kalypsys) • David Spellmeyer (IBM) • Dimitris K. Agrafiotis (Johnson & Johnson) • Horst Hemmerle (Eli Lilly) • James M. Caruthers (Purdue University) • Jeremy G. Frey (University of Southampton) • Joel Saltz (Ohio State University/University of Maryland/Johns Hopkins University) • John M. Barnard (Digital Chemistry) • John Reynders (Eli Lilly) • Peter Murray-Rust (University of Cambridge) • Peter Willett (University of Sheffield) • Thompson Doman (Eli Lilly) • Val Gillet (University of Sheffield) Industry andAcademia Met October 2005 will meet this fall
Chemical Informatics and Cyberinfrastucture Collaboratory Funded by the National Institutes of Health www.chembiogrid.org CICC CICC CICC Combines Grid Computing with Chemical Informatics Large Scale Computing Challenges Science and Cyberinfrastructure CICC is an NIH funded project to support chemical informatics needs of High Throughput Cancer Screening Centers. The NIH is creating a data deluge of publicly available data on potential new drugs. Chemical Informatics is non-traditional area of high performance computing, but many new, challenging problems may be investigated. NIH PubMed DataBase OSCAR Text Analysis Cluster Grouping Toxicity Filtering Docking . Initial 3D Structure Calculation OSCAR-mined molecular signatures can be clustered, filtered for toxicity, and docked onto larger proteins. These are classic “pleasingly parallel” tasks. Top-ranking docked molecules can be further examined for drug potential. Chemical informatics text analysis programs can process 100,000’s of abstracts of online journal articles to extract chemical signatures of potential drugs. Molecular Mechanics Calculations Big Red (and the TeraGrid) will also enable us to perform time consuming, multi-stepped Quantum Chemistry calculations on all of PubMed. Results go back to public databases that are freely accessible by the scientific community. • CICC supports the NIH mission by combining state of the art chemical informatics techniques with • World class high performance computing • National-scale computing resources (TeraGrid) • Internet-standard web services • International activities for service orchestration • Open distributed computing infrastructure for scientists world wide NIH PubChem DataBase Quantum Mechanics Calculations IU’s Varuna DataBase POVRay Parallel Rendering Indiana University Department of Chemistry, School of Informatics, and Pervasive Technology Laboratories
Next steps? • Define WSDL interfaces to enable global production of compatible Web services; refine CML • Look at Pipeline Pilot • Extend Computational Chemistry (Varuna) Services • Routine TeraGrid Big Red use • Ready to try “Prototype Production” on OSCAR3 CDK Mopac • Develop more training material • Link to screening center via Scripps CICC Prototype Web Services Basic cheminformatics Key Ideas Molecular weights Molecular formulae Tanimoto similarity 2D Structure diagrams Molecular descriptors 3D structures InChi generation/search CMLRSS • Add value to PubChem with additional distributed services and databases • Wrapping existing code in web services is not difficult • Provide “core” (CDK) services and exemplars of typical tools • Provide access to key databases via a web service interface • Provide access to major Compute Grids Application based services Compare (NIH) Toxicity predictions (ToxTree) Literature extraction (OSCAR3) Clustering (BCI Toolkit) Docking, filtering, ... (OpenEye)Varuna simulation
Varuna environment for molecular modeling (Baik, IU) Chemical Concepts Researcher Papers etc. Experiments ChemBioGrid Simulation ServiceFORTRAN Code, Scripts DB ServiceQueries, Clustering,Curation, etc. ReactionDB QM Database Condor PubChem, PDB,NCI, etc. QM/MM Database TeraGridSupercomputers“Flocks”