310 likes | 515 Views
AMBIT Chemoinformatics Software for Data Management. Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria. Introduction – why Ambit ?.
E N D
AMBITChemoinformatics Software for Data Management Joanna Jaworska Nina Jeliazkova P&G Brussels, Ideaconsult Ltd., Belgium Bulgaria
Introduction – why Ambit ? • Limited free, publicly accessible, methodologically transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD) • Realization that efficient use of existing information on chemicals requires better ways for • Storage • standardized formats, computer automated verification of structures, capability to store large amounts of data • Taking advantage of rapidly evolving field of data mining and extraction of relevant information
IT strategy • Ambit - building blocks for Decision Support System • High emphasis on • interoperability for “plug and play” • Chemical Markup Language (CML) • acknowledged method of encoding chemical data in XML • Is being adopted by a large number of chemical organisations, from government, through commercial to academia. • The choice of CML for the internal format makes the database independent of the software which is able to access it, in contrast to some proprietary solutions. • Flexibility modular design • Transparency • Open source, relying on open standards. Open source software lowers the user barrier, facilitates the dissemination activities and enables the reproducibility of models and results • The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kithttp://cdk.sourceforge.net/ • The software is based on MySQL database (www.mysql.com), which is the most popular open source relational database.
Ambit - Overview • AMBIT software is a set of libraries and tools, providing various cheminformatics functionalities for data management. • The AMBIT system consists of a database and functional modules allowing a variety of flexible searches and mining of the data stored in the database. The unique feature of AMBIT is the ability to store multifaceted information about chemical structures and provide a searchable interface linking these diverse components. • The AMBIT database: • AMBIT database contains over 450 000 chemical compounds with data imported from over a dozen databases [http://ambit.acad.bg/ambit/stats/]. The number of compounds is growing all the time and one the of system’s great strengths is that any dataset can be imported for comparison and analysis. • stores chemical structures, their identifiers such as CAS, INChI numbers; attributes such as molecular descriptors, experimental data together with test descriptions, and literature references. The database can also store QSAR models. In addition the software can generate a suite of 2D and 3D molecular descriptors. • can be searched by identifiers, attribute value or range, experimental data value or range, user defined structure and substructure, structural similarity • AMBIT Discovery performs chemical grouping and assesses the applicability domain of a QSAR offering a variety of methods including using different approaches to similarity assessments: statistical that rely on ‘descriptor space’; approaches based on mechanistic understanding; and approaches based on structural similarity.
Software build using Ambit blocks • ToxTree ToxTree is a flexible user friendly application which integrates structure based (classification) schemes. Currently 3 schemes are available: Verhaaar for fish toxicity, Cramer for human acute toxicity, BfR rules for skin irritation. ToxTree implements a plug-in mechanism, allowing to be extended by modules developed at a future time, without recompiling the application. ToxTree and AMBIT modules can be integrated one within another. • Toxmatch – stand alone application for pairwise similarity assessments with intention for read-across. • QSAR database under development. Will store information in QMRF. Large effort on standardization
Ambit database - Two user interfaces • Two user to the database • Online • Standalone • Online • a more restricted interface • Standalone • Full interface • Can be used for storing & managing confidential data • Common • Can link with other databases and pull information via webservices
AMBIT Database Today Not restricted to these datasets!Any dataset can be imported. (e.g. DSSTox, AQUIRE, LLNA …)
AMBIT database functionalities • Storage: information about chemicals name and structure, descriptors, experimental data and QSAR models • Example with a tailored template : BCF golden database LRI project ( EURAS) Q2 2007 • QSAR database with QMRF ( ECB funded) • Conversion: • Different computer formats of structure, CAS-structure • Calculation • Variety of descriptors, The available list is growing thanks to contributions to CDK • Search • identification search (CAS, SMILES, chemical name) • Descriptor search • Experimental data search • Substructure and similarity search • Complex searches with multiple criteria (standalone)
Similarity searching • Rationale based on the Similar property Principle: structurally similar • compounds tends to exibit similar properties • Calculate the pairwise similarity between the know active and • each compound in the database • Rank the database compounds based on similarity measure • Select top n% for biological testing
What kind of searches are desired ? • Detailed analyses for pairwise similarity • Similarity of a compound to compounds in the database • Similarity of a compounds to a reference set • Similarity of a set of compounds to compounds in the database • Grouping based on chemical class
Ambit online • Searching for basic information
Ambit Database Tools 1.20Standalone applicationavailable at http://ambit.acad.bg/downloads
Ambit converter (Batch search) • Ambit converter can open : CML, CSV, HIN, ICHI, INCHI, MDL MOL, MDL SDF, MOL2, PDB, SMI, TXT and XYZ file types • Ambit converter can save : SDF, MOL, CSV, TXT, SMI file types. • CAS-SMILES conversion based on a database lookup • Descriptors calculation • Cramer rules, • Verhaar scheme
Ambit Database Tools 1.20 • Import to Database • Compounds – several file formats • Descriptors – SDF, CSV, TXT • Experimental data – SDF, CSV, TXT • QSAR models – SDF, CSV, TXT • Database processing • Calculate SMILES/Fingerprints/Atom environments – necessary in order to perform substructure and similarity search. Should be invoked after importing compounds into database • several file formats • Descriptors calculation • Distances calculation – used to speed up distance between heavy atoms query
Ambit Database Tools 1.20 • perform a CAS RN search in the database (submenu "Search -> CAS RN search"); • perform a SMILES search in the database (submenu "Search -> SMILES"); • perform a molecular formula search in the database (submenu ("Search -> Molecular formula"); • define structure,descriptor,distance-based and experimental data criteria and perform searches in the database database • Output: • On screen • To file The user can select between the different datasets existing in the AMBIT database. Subsequent searches will be performed only within the selected dataset
AMBIT User InterfaceExample: Search by structure • Exact search • Substructure search • Similarity search • Fingerprints • Atom environments
Similarity based on toxicity mechanismVerhaar scheme Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp.471-491, 1992 • 34 rules • 5 classes • Class 1. Narcosis or baseline toxicity • Class 2 Less inert compounds • Class 3 Unspecific reactivity • Class 4 Compounds and groups of compounds acting by a specific mechanism • Class 5 Not possible to classify according to these rules
Chemical similarity assessment using the database • Exact substructure search based on 2D • Structural Similarity search (various methods) • Criteria on descriptors • Based on mechanistic understanding ( Verhaar scheme)
Another view on Similarity assessments with Toxmatch and Discovery • Discovery • similarity to a set (summary representation) • Toxmatch • pairwise similarities • Similarity to a set (nearest neighbours)