60 likes | 143 Views
Indiana University ECCR Summary. Infrastructure: Cheminformatics web service infrastructure made available as a community resource including 2D and 3D databases, predictive models, statistics, docking, name to structure conversion, 2D to 3D conversion, similarity and clustering.
E N D
Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource including 2D and 3D databases, predictive models, statistics, docking, name to structure conversion, 2D to 3D conversion, similarity and clustering. Research: Major research areas include data mining of PubChem bioassays including network models, predictive models for cytotoxicity, QSAR domain applicability, clustering huge datasets, data mining chemistry literature and documents, and Semantic Web applications Delivering a CIC distance class at Michigan Education: A leading center in cheminformatics education offering Ph.D., M.S. and innovative Distance Education program aimed at government, academia and industry. Collaboration with University of Michigan. 2D Map of PubChem from GTM http://www.chembiogrid.org
Cheminformatics Web Service Infrastructure • Extensive set of (Web) services made available as a community resource including 2D and 3D databases, predictive models, statistics, docking, name to structure conversion, 2D to 3D conversion, similarity and clustering. Teragrid and local supercomputing resources provide scalability to millions of compounds. Prototypes a comprehensive open community infrastructure framework for algorithm & application deployment: Infrastructure Algorithms & Applications PubChem and Other Public Chemical & Biological Information Sources Aggregation and Data Mining Algorithms Cheminformatics, Comp. Chemistry, and Statistical (R) Web Services Knowledge Discovery Tools Teragrid, IU, and other Grid/Cloud Cyberinfrastructure And Supercomputing Resources Educational applications http://www.chembiogrid.org
Exemplar Current Project: Pub3D Services • Provides 3D structures for 17M PubChem compounds. • Scalable to hundreds of millions of structures • Accessible via SQL, Web page and Web service interfaces. • Can be included in workflows. • Will include multiple conformers • Backed by novel algorithms and distributed DB architecture to enable fast shape queries. • Also enables density of chemical space analysis. http://www.chembiogrid.org
Current Major Research Areas • Data mining of PubChem bioassays using Bayesian models. • Using Cytotoxicity models to predict acute toxicity • Collaboration with Stephan Schurer, Scripps (Florida) • Algorithms for domain applicability of QSAR models. • Exploration of chemical spaces using density of space approach. • Virtual Screening for anti-malarials • Collaboration with Jean-Claude Bradley, Drexel University • Two micromolar inhibitors of falcipain-2 identified • Supporting predictive model deployment and exchange (PMML) • Cheminformatics cyberinfrastructure Fast comparison of toxicity data sets using binary fingerprints. http://www.chembiogrid.org
Graduate Program in Cheminformatics • Unique program in Cheminformatics: we are the only center in the U.S. we are aware of offering a range of formal qualifications in cheminformatics. • As of fall 2008 will have 6 Ph.D. students, 8 M.S. students, and 4 graduate certificate students who come from government, industry & academia. • All courses are available by Distance Education including CIC courseshare with Michigan. • We have received Industry Fellowships from Lilly and Symyx. • General review of cheminformatics education in Drug Discovery Today 11, 9&10 (May 2006), pp436-439 • Distance Education J. Chm. Inf. Model 2006; 46; 495-502 Delivering a CIC distance class at Michigan http://cheminfo.informatics.indiana.edu
Future Directions New Initiative: linking bioinformatics and chemical informatics. Parallel deterministic Annealing/GTM clustering and dimensional scaling (MDS) of huge datasets Exploiting multicore and other advanced chip architectures • Extracting and mining chemical information in journal articles (SMILES index NLP, and structure/ontology searching) • Use of Semantic Web for automatic workflow composition. • Compound & Bioassay Network Models allow investigation of cross-assay relationships and the use of PubChem as a source of polypharmacology. • Network view of SARs providing a framework to analyze structure activity landscapes. 2D Map of PubChem from GTM http://www.chembiogrid.org