380 likes | 503 Views
Structural Search Using ChemAxon Tools. Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia. January, 2007. Structural Search Using ChemAxon Tools. Introduction Search types in JChem Interfaces Search options and features
E N D
Structural Search Using ChemAxon Tools Szabolcs Csepregi*, Szilárd Dóránt, Nóra Máté, Miklós Vargyas, Péter Kovács, György Pirok, Ferenc Csizmadia January, 2007
Structural Search Using ChemAxon Tools Introduction Search types in JChem Interfaces Search options and features The Chemical Terms language (search result filtering) Performance Applications Future plans All examples were generated by Marvin
ChemAxon introduction Company • Founded in 1998, based in Budapest, Hungary, representation in the US, UK and Japan • Wide cheminformatics expertise (>30 staff) 9 PhD, 11 MSc • Wide industry expertise >180 corporate clients worldwide + >1000 academic users Products • Cheminformatics tools - structure drawing, visualization, search, transformation, library profiling and property prediction • Enterprise chemistry database and cartridge technology Technology • Powerful/Flexible – Enterprise API toolkits • Solutions – Desktop applications • Java based + .NET – Platform independent + Web ready Mantra • Do what they want, respond quickly
Selected Application Areas Global licenses Custom development projects Value added constructions Websites/portal front and back end Content/Educational
For academic teaching and research • Unlimited* personal license for all products, support and upgrades *JChem base = 3 searches/min • Open term license for teaching • Repeating 2 year license term for research – provided ChemAxon are cited in publications • License covers students of the department • Unlimited number of applications / institution More information at:http://www.chemaxon.hu/forum/ftopic193.html
Where chemical searching useful? Diversity of applications. : • Compound registration systems • Electronic Laboratory Notebook (ELN) systems • Pharmacophoric group(functional group) identification (JChem Screen, JKlustor) • Rule-based fragmentation of libraries (JChem Fragmenter, RECAP) • Virtual reaction processing (JChem Reactor) • Standardization (canonicalization of structures, JChem Standardizer) • Toxical fragment identification (superstructure search)
Search types in JChem Structural search type Query Result • Atom By Atom Search or structural search: • Similarity search: • Different Descriptors • Different Metrics Substructure Superstructure Exact Perfect MC(E)S– maximum common (edge) substructure
Structural search interfaces • JSP(Java Server Pages): web GUI for database • Command line utility:jcsearch: for files and DB • Java and .NET API: • MolSearch class • isMatching() – Only to check matching • findFirst(), findNext() Enumerate all • findAll() possible matchings • JChemSearch class: JChem Base • Cartridge: access all functionality from Oracle SQL • Chemical Terms • Instant JChem
Database engines: • Oracle • MySQL • MS SQL Server • PostgreSQL • MS Access • IBM DB2 • etc. Operating systems: • Windows • Linux • Mac OS X • Solaris • etc. File formats: • SMILES • MDL molfile (v2000 and v3000) • MDL SDF • RXN • RDF • MRV Integration: • 100% Java • extensive API • JChem Cartridge for Oracle • .NET support via JNBridge Compatibility and integration
Instant JChem http://www.chemaxon.com/conf/Instant_JChem.ppt Instant JChem Desktop application for local and remote chemical database management, search and structure based prediction • Simple connect to external databases and share your native database simultaneously • Powerful search functionalities • Scalable – explore ’00,000’s+ live structures • Dynamically predict properties using Calculator Plugins • Apply canonicalization rules for import and viewing • Wide import / export options • Merge data sets into a single set • Very active development – what do you want to do?
Access JChem functionalities from non-Java environments via SQL functions of Oracle All search features of JChem Base Complex chemical filters and property predictors using Chemical Terms expressions Standardization (structure canonicalization) during registration Structure format conversions 2D, 3D image generation Library enumeration using virtual reactions (Reactor) JChem Cartridge http://www.chemaxon.com/JChem_Cartridge.ppt JChem Cartridge for Oracle
Standardizer http://www.chemaxon.com/Standardizer.ppt Canonicalization with Standardizer • Aromatize/dearomatize • Add/remove explicit hydrogens • Convert mesomers / tautomers / functional groups • Remove • solvents • counterions by list • smallest fragment • retain largest fragment • Set/Remove chiral flag, remove stereo features • Ungroup S groups • Enumerate by stoichiometry values • 2D, 3D coordinate generation (cleaning) • Template based cleaning
Search options • Structural search options: • Stereo on/off, absolute stereo (ignore chiral flag) • Double bond stereo: no check/marked/all double bonds • Chemical Terms filter expression • Tautomer search • Ignore charge/isotope/radical/valence/mixture brackets • Exact charge/radical/isotope/query features/bond/stereo matching • Vague bond matching modes: „or aromatic”; ignore bond types • Timeout limit • Order sensitive hits • Pre-assignment of query and target atoms • etc
Query features 1. Atomic features • Query atom types: • any, • hetero, • list, • not list • Pseudo atoms e.g. “Resin” • Explicit lone pairs (matches to impliedlonepairs as well.) • Charge, isotope, radical • Link nodes (repeatable):
Query features 2. Query properties • Query properties:
Query features 3. Atomic SMARTS features • SMARTS atoms: • Additional query properties: • Example: • Carbonyl C, but not amide
Query features 4. Bond features & components • Query bond types: Any, single or double, single or aromatic, double or aromatic • Bond topology: chain/ring • Smarts bonds • Component level grouping
Levels of check: All Only marked double bonds (MDL: stereo care flag) None Depiction Meaning Cis Trans Cis or trans (unknown) Not trans Not cis Stereo searching 1. Double bonds
Up Down Up or down Stereo searching 2. Tetrahedral chirality • Stereo bond types: • Relative stereo configuration • Chiral flag model • Enhanced stereo representation: AND<n>, OR<n>, ABS groups
S-group integration (query & target) Both sides are treated similarly by the search: • Abbreviations (super-atom S-groups): • Multiple groups: Other S-groups supported: component, mixture and formulation brackets:
Reactants, agents, products Transformation recognition (mapping) Stereospecific reactions (inversion, retention) Reactant grouping Reacting center Reaction search
R-group search • Scaffold, R-group definitions • Monovalent, divalent R-groups • R-logic • Occurrence • If-then • Rest H
Target Query Hydrogens • H representations: • Explicit • Implicit • Query H count: • total (H<n>) • implicit (h<n>) • Example:
Applications of Chemical Terms virtual synthesis reaction and synthesis rules CT pharmacophore analysis pharmacophore definitions drug design goal functions structuralsearch advanced query expressions e.g. in Instant JChem & Cartridge
Chemical Terms Elements of the language • property calculations (partial charge distribution, pKa, logP, HB donors, acceptors, …, etc) • structure matching functions (describing functional groups, reaction sites, similarity…) • arithmetic and logic-operators Chemical Terms examples
Chemical Terms • Some available functions • Structural search (match, matchcount) • Partial charge distribution • pKa, Log P, Log D, major microspecies • Polarizability • Topological Polar Surface Area • Number of rotatable bonds, rings, aromatic rings, etc. • Number of HB donors/acceptors • Exact mass • Arithmetic and logic operators • Extensible: your own Java plugins can be easily added. • Etc.
query JChem table Search Fingerprintscreening Hits for the query Screenedout Need to be searched Atom by atom search Results Fingerprint screening in the database • JChem database searches use fingerprint technology for fastest search results. • It rapidly* filters out most non-hits -usually more than 99% of them are rejected. • Supported fingerprint types: • Chemical hashed fingerprints • User-defined additional structural keys * Average screening time in a 3-million cached table: ~0.5s
Performance Searching 3 million smiles structures (multiplied NCI 2000) in DB: JChem Base 3.2, Dual Xeon 3GHz, 2GB RAM; Oracle 9.2.0.7.0
Application: R-group decomposition JChem is able to identify the ligands of a given scaffold at specified substitution positions: Query(scaffold) Result Library R-group decomposition
Further applications of structural search in JChem • Transformations - Standardizer & Reactor • Identification of pharmacophoric groups - Pmappernitro:amidine: • Identification of bond cleavage - Fragmenter ether cut: Enamine-amine tautomerism: Converting covalent form of alcoholates to ionic form:
Future plans • Searching in Markush targets (R-groups, atom lists, link nodes, bond lists) • New bracket types • Further generic atom types (AH, QH, X, M, etc.)
Future plans – Combinatorial Markush database prototype Stage I. “Combinatorial libraries” Markush features: • R-groups • Any nesting • Up to 2 connections • In ring or chain Functionality: • Registration into database • Search in Markush DB (w/o enumeration) • Enumeration (full, selective or hit enumeration) • Enhanced Markush sketching (MarvinSketch) Very complex Markush libraries can be handled, even ones with more than 263 members • Atom lists • Bond lists • Link nodes
Summary • JChem suite: contains a broad range of chemical search facilities. • Chemical Terms: allows easy and flexible data mining. • Structural search is a useful tool for many applications.
References • JChem Query Guide: • http://www.jchem.com/doc/user/Query.html • Chemical Terms reference: • http://www.jchem.com/doc/user/EvaluatorLanguage.html • JChem Base JSP demo page: • http://www.jchem.com/examples/jsp1_x/index.jsp • Instant JChem: • http://www.chemaxon.com/product/ijc.html • Jcsearch command line tool: • http://www.jchem.com/doc/user/Jcsearch.html • API documentation: http://www.jchem.com/doc/api/index.html • (chemaxon.sss.search.MolSearch,chemaxon.jchem.db.JChemSearch)
Thank you for your attention Máramaros köz 3/a Budapest, 1037Hungaryinfo@chemaxon.comwww.chemaxon.com