260 likes | 291 Views
ChemModLab offers the capability to build and evaluate QSAR models using various statistical methods and visualization tools. ChemSpider complements this by providing a web-based chemical informatics resource for chemists. Together, they support virtual screening for analyzing chemical compounds.
E N D
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams
S. Stanley Young + ECCR and ChemSpider Teams ChemSpider : A Web-based Chemical Informatics Resource
What is ChemSpider? • ChemSpider is a molecular structure-centric web service for chemists: • Chemical structure drawing, manipulation, visualization, modeling & databasing • Web location to deposit, curate and enhance data associated with chemical structures • Web structure-based access to federated chemistry databases representing chemical vendors, literature, online data, patents and other forms of chemistry data
How do people generally use ChemSpider? • Searching for chemical structures, in rank order, via: • Registry numbers, trade names and synonyms. • Structure identifiers such as SMILES or InChI • Intrinsic properties: commonly mass-based searches executed by mass spectrometrists • By systematic names: IUPAC or CAS Index name • Generation of physicochemical properties • Text-based searching of Open Access articles
ChemSpider Status August 2007 • Online database of over 16.5 million structures • Systems in place for: • Single structure and data collection depositions • Association of analytical data with structures • Ability to curate data for each individual record • Indexing of and Integration to: • Over 70 individual databases • Patents from the US, European and Asian Patent offices • Text-based searching of over 50,000 Open Access articles • Over a thousand unique users access ChemSpider per day
External Integrations - Wikipedia The links between Wikipedia and ChemSpider are formed automatically
What is ChemModLab? • ChemModLab is a Web Service for building and evaluating QSAR models. • Send your data: assay results and SD file. • Use any or all of five descriptor types (2D). (Use your own descriptors) • Use any or all of 16 statistical modeling methods. • Predict potency of untested compound.
Virtual Screening ChemModLab ChemSpider
ChemModLab Dialog (1) Data Input
ChemModLab Dialog (2) Five 2D Descriptor Sets
ChemModLab Dialogue (3) 16 Modeling Methods
ChemModLab Modeling Methods • 16 Statistical Modeling Methods • Trees: RandomForest, rpart, tree • Neural networks • k-nearest neighbors • Support vector machines • Partial least squares • Partial least squares with linear discriminant analysis • Least angle regression • Ridge regression • Elastic net • Principal components regression • Family ensemble of k-nearest neighbors, using 70% selection • Family ensemble of tree, using 70% selection • Family ensemble of rpart, using 70% selection • randomForest using 70% selection
ECCR@NCSU + ChemSpider Plan User submits data to ChemModLab to get QSAR Model(s). Model is sent to ChemSpider. ChemSpider computes a “virtual screen”. The hit-list is clustered and sent to the user.
Accumulation curves Compare descriptor sets, given a method
Accumulation Curves Compare modeling methods, given a descriptor set
Diversity Map Cluster Active Compounds Modeling Methods
ModelEvaluation Take detailed looks at which models? AID348 (NCGC): KNN – Ph ENet – CAP RF – B# RF – CAP RF – FF Tree – CAP Tree – Ph Tree – FF PLS – CAP
Summary • ChemSpider is a web chemical informatics center. • ChemModLab is a free, web service for QSAR. • Together they support sophisticated virtual screening. • * ChemModLab is supported by the NCI RoadMap project.
ECCR@NCSU Group ChemSpider Group ChemModLab Team Jacqueline M. Hughes-Oliver Atina D. Brooks Gary W. Howell Kirtesh Patil Stan Young Qianyi Zhang ChemSpider Team Antony Williams (project lead) A rotating team of advisors and developers including many contributions from the Open Source community eccr.stat.ncsu.edu www.chemspider.com