260 likes | 468 Views
Szilárd Dóránt. JChem Base chemical database. May, 2005. Contents. Introduction Structural overview Compatibility Administration JChem tables Fingerprints Structural search. Structure cache Standardization Search options JSP example API examples Performance Future plans.
E N D
Szilárd Dóránt JChem Base chemical database May, 2005
Contents Introduction Structural overview Compatibility Administration JChem tables Fingerprints Structural search Structure cache Standardization Search options JSP example API examples Performance Future plans
Introduction JChem Base provides high performance Java based tools for the storage, search and retrieval of chemical structures and associated data. These components can be integrated into web-based or standalone applications in association with other ChemAxon tools.
Structural overview Application Web application (JSP) • JChem Base API: • Chemical logic • Structure cache Web browser JDBC driver: Standard interface to the RDBMS RDBMS (e.g. Oracle, MySQL, etc.) : Storage and security
Database engines: • Oracle • MySQL • MS SQL Server • PostgreSQL • MS Access • DB2 • etc. • Operating systems: • Windows • Linux • Mac OS X • Solaris • etc. • File formats: • SMILES • MDL molfile (v2000 and v3000) • MDL SDF • RXN • RDF • MRV • Integration: • 100% Java • extensive API • JChem Cartridge for Oracle Compatibility and integration
User interface for • creating tables • import • export • deleting rows • dropping tables • Most functions are also available from command-line. Administration with JChemManager
The property table • The property table stores information about JChem structure tables, including: • Fingerprint parameters • Custom standardization rules • Recent changes (to optimize cache updates) • Other table options and information • Database-related licence keys • More than one property table can be used, each property table represents a particular JChem environment.
Chemical Hashed Fingerprints • Chemical Hashed Fingerprints encode structural patterns in bit strings • If structure A is a substructure of structure B, every bit in B’s fingerprint will be set that is set in structure A’s fingerprint: • Tanimoto similarity of hashed fingerprints can be used for diversity analysis and similarity search:
Structural search in database • Two stage method provides optimal performance: • Rapid pre-screening reduces the number of possible hit candidates • Chemical Hashed Fingerprints are used for substructure and superstructure searches • Hash code is used for duplicate filtering (usually during compound registration) • Graph search algorithm is used to determine the final hit list
Structure Cache Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS Instant access to the structures for the search process Reduced load on the database server Incremental update ensures minimum overhead after changes in the table Small memory footprint due to SMILES compression Optimized storage technique Approximately 100MB memory needed for 1 million typical drug-like structures (using 512 bit long fingerprints)
Standardization • Default standardization includes: • Hydrogen removal • Aromatization • Custom standardization can be specified for each table by specifying an XML configuration file at table creation or in the “Regenerate” dialog of JChem Manager (jcman)
before after Custom Standardization Example
Maximum search time / number of hits • SQL SELECT statement for pre-filtering • Ordering of results • Result table • Inverse hit list • Chemical Terms filter constraint Database search options
JSP example application • Open source, customizable • Features: • Substructure, Superstructure, Exact and Similarity search • Molecular Descriptor similarity search with descriptor coloring • Substructure hit alignment and coloring, inverse hit list • Chemical Terms filter • Import / Export • Export of hits • Insert / Modify / Delete structures
API example : connecting to a database ConnectionHandler ch = new chemaxon.jchem.db.ConnectionHandler(); ch.setDriver(“oracle.jdbc.driver.OracleDriver”); ch.setUrl(“jdbc:oracle:thin:@localhost:1521:mydb”); ch.setPropertyTable(“JChemProperties”); ch.setLoginName(“scott”); ch.setPassword("tiger"); ch.connect(); // the java.sql.Connection object is available if needed: Connection con=ch.getConnection(); … // closing the connection: ch.close();
API example : database import Importer importer = new chemaxon.jchem.db.Importer(); importer.setConnectionHandler(conh); importer.setInput(“sample.sdf”); // importer.setInput(is); // alternatively a stream can also be specified importer.setTableName(“SCOTT.STRUCTURES”); importer.setHaltOnError(false); importer.setDuplicateImportAllowed(false); //can filter duplicates // specifying SDFile field - table field pairs: String fieldPairs =“DB_Field1=SDF_Field1; DB_Field2=SDF_Field2”; importer.setFieldConnections(fieldPairs); int importedCount = importer.importMols(); System.out.println( “Imported” + importedCount + “structures” );
API example : database export Exporter exporter = new chemaxon.jchem.db.Exporter(); exporter.setConnectionHandler(conh); exporter.setTableName(“structures”); //data fields to be exported with the structure: exporter.setFieldList(“cd_id cd_formula name comments”); String fileName=“output.sdf”; OutputStream os=new FileOutputStream(fileName); exporter.setOutputStream(os); exporter.setFormat(“sdf”); int exportedCount = exporter.writeAll(); System.out.println(“Exported ” + exportedCount + “structures”);
API example : database search JChemSearch searcher = new chemaxon.jchem.db.JChemSearch(); searcher.setConnectionHandler(ch); searcher.setSearchType(JChemSearch.SUBSTRUCTURE) searcher.setQueryStructure(“c1ccccc1”); searcher.setStructureTable(“SCOTT.STRUCTURES”); // a query that returns cd_id values can be used for prefiltering: Searcher.setFilterQuery( “SELECT cd_id FROM structures, biodata WHERE ” +“structures.cd_id = biodata.cd_id AND biodata.toxicity < 0.3” ); searcher.setWaitingForResult(true); // otherwise runs in a separate thread searcher.setStructureCaching(true); // caching speeds up the search searcher.run(); // getting the results as cd_id values: int[] results=searcher.getResults();
API example : inserting a structure // ConnectionHandler, mode, table name and data field names: UpdateHandler uh = new chemaxon.jchem.db.UpdateHandler( ch, UpdateHandler.INSERT, “structures”, “comment, stock”); uh.setValueForFixColumns(“c1ccccc1”); // the structure // specifying data field values: uh.setStructureValueForAdditionalColumn(1, “some text”); uh.setStructureValueForAdditionalColumn(2, new Double(8.5)); uh.setDuplicateFiltering(true); // filtering duplicate structures int id=uh.execute(true); // getting back the cd_id of the inserted structure if ( id > 0 ) { System.out.println(“Inserted, cd_id value : ” + id); } else { System.out.println(“Already exists with cd_id value : ” + (-id)); } // storing update information, the database connection remains open : uh.close();
Number of compounds Elapsed time Duplicates not checked Duplicates checked 10,000 32s 45s 100,000 4min 11s 6min 20s 200,000 8min 17s 12min 26s Query Number of hits Search time (s) 12 0.1 936 0.9 0 1.2 49740 10.7 Performance (1) Compound registration: Substructure search in a table of 3 million compounds: Server parameters:Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i
Query Number of hits Search time (s) 24 1.5 156 1.3 336 1.3 Performance (2) Similarity search:Tanimoto >0.8 Server parameters:Windows XP; 1 CPU: Intel P4 3.0GHz; 2GB RAM; Oracle 9i
Future plans • Additional layer: JChem Server (later also as grid) • Structural keys as optional extension to current fingerprints • Tables for storing query structures • Tables for storing general (Markush) structures • Partial clean option for hit alignment • Installer • etc.
Summary ChemAxon’s JChem Base toolkit provides sophisticated methods to deal with chemical structures and associated data. The usage of fingerprints and structure cache provide high search performance.
Links • JChem home page: • www.jchem.com • Live demos: • www.jchem.com/examples • API documentation: • www.jchem.com/doc/api • Brochure: • www.chemaxon.com/brochures/JChemBase.pdf
Máramaros köz 3/a Budapest, 1037Hungaryinfo@chemaxon.comwww.chemaxon.com Thank you for your attention