1 / 60

Reactions: Databases and Data Management

Reactions: Databases and Data Management. First Day:. Available Reaction Databases Required Information Concept of Reaction Information Retrieval Problems with Reaction Retrieval Solving a Synthetic Problem. Second Day:. Reaction Classification - Overview - Applications

Audrey
Download Presentation

Reactions: Databases and Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reactions: Databases and Data Management First Day: Available Reaction Databases Required Information Concept of Reaction Information Retrieval Problems with Reaction Retrieval Solving a Synthetic Problem Second Day: Reaction Classification - Overview - Applications Synthesis Planning

  2. Available Reaction Databases ISI CCR (1987) CAS CASREACT (1988 / 1991) ChemReact (1991) ChemInform RX (1991) Beilstein - STN 1988 - CrossFire plus Reactions (1996) • online: CASREACT (CAS) (ca. 4 Mio) ChemReact (InfoChem) (ca. 2.5 Mio) CrossFireplusReactions (Beilstein) (ca. 10 Mio) ChemInform RX on STN (FIZ Chemie) (ca. 0.7 Mio) CCR (ISI) (ca. 0.5 Mio) • inhouse: ChemInform Reaction Library (MDL) ChemReact (InfoChem) CrossFireplusReactions (Beilstein) Specialty Databases (several vendors) Proprietary Databases For a good review see:Zass, E. "Reaction Databases", In: Encyclopedia of Computational Chemistry, Schleyer, P. von R.; Allinger, N.L.; Clark, T.; Gasteiger, J.; Kollman, P.A.; Schaefer, H.F.; Shreiner, P.R. (Eds.). Wiley, Chichester, 4, 2402-2420. QD39.3.E46 E53 1998

  3. Some Overlap Studies in Reaction Databases Borkent, J.H.; Oukes, F.; Nooordik, J.H.J. Chem. Inf. Comput. Sci.1988, 28, 148 – 150 Authors used three specific queries for small databases of three systems (REACCS, ORAC, SYNLIB) with three specific queries : cyclopropanation (10%), enolate alkylation (10%), and ketone reduction (<15%). Faiz, A; Parkin, D. J. Chem. Inf. Comput. Sci.1999, 39, 281 - 288 Aut hors use same queries as above but compare compare the overlap of large database (Beilstein) with smaller ones available from MDL using references and Reactions. Results are roughly the same. (see later discussion) Hendrickson, J. B.; Zhang, L. J. Chem. Inf. Comput. Sci.2000, 40, 380 – 383 Authors measure extent of duplication in 16 reaction databases from four commercial sources (ca. 1 Mio reactions) using the COGNOS system (will be discussed later under “Classification of Reactions”) and found less than 3% overlap.

  4. Molecules vs.Reactions – The Difference • Difference between searching for molecule data vs. searching for reaction data • Query:Is this particular molecule or similar ones known? Specific data? • Answer:Yes or No • Query:How to selectively reduce the nitrile group (transformation?) • Answer:Pointers to relevant examples in the literature • Criteria: • Efficient transformation • Functional group compatibility • Reactions conditions • Solving these type problems requires in most cases involvement of the synthetic chemist Molecules: Reaction Conditions? Reactions:

  5. Required Information Information Needs of Synthetic Organic Chemists in Basic Research and Development • new preparation of intermediates and starting materials • well established, high yield preparations (experimental procedures) • new synthetic methodologies (new reagents, catalysts etc.) • information on starting materials (availability, price, physical data etc.) • physical properties of reagents, solvents and catalysts • access to the primary and secondary literature • spectral information of related compounds General:searching for information on molecules precedes retrieval of synthetic methodology data

  6. Required Information Major Types of Reaction Information C. Reaction Condition Information - data queries • Preparative Information • - molecule queries ? H2, Lindlar cat. ? MeOH Find literature about the stability of non-involved functionalities under given reaction conditions Find information for the preparation of the desired or a very similar molecule (product, intermediate, reagent) D. Topical Information - keyword queries B. Methodology Information - transformation queries What is known about the hetero Diels-Alder reaction ? ? Browse the literature for ideas and potential applications Find relevant literature examples to solve the synthetic problem (selectivity)

  7. Preparation of a Known Compound Reaction Information from the Beilstein Database: Preparation of a known compound using Crossfire Web

  8. Property of a Known Compound Information from the Beilstein Database: Compound Properties using CrossFire Web

  9. Information for Synthetic Methodologies Substructure-based Reaction Searches Synthetic Problem: Full Structure Search: No hits Reaction Substructure Search (colored fragment): 188 hits Keyword Search “Michael Addition”: 3338 hits Data Source: MDL’s combined reaction databases (ca. 1 Mio. reactions)

  10. How to Involve the Synthetic Chemist Involvement of the chemist requires • Information specialists have to change roles: from searcher to teacher/instructor • Right tools have to be available to the chemist to simulate his/her problem solving process Requirements: • User interfaces based on users’ tasks and capabilities (e.g. Scifinder, CrossFire Web, Reaction Browser) • Simplification of the querying process (natural not rule dependent) • Effective indexing of databases (e.g. classification) • Hierarchical thesauri for keywords • Efficient post-search management tools • Seamless integration of various information sources (web environment, point-and-click) • Most importantly: recognize the vast knowledge of synthetic chemists

  11. Complementary Nature of Databases Example: Beilstein vs. MDL reaction databases Characteristic Features (Strength) of Databases CrossFireplusReactions: MDL Reaction Databases: • selected synthetic methodologies •  structural information for reagents, • catalysts, and solvents •  reaction classification •  post-search data management • (clustering) •  enduser oriented query • formulation •  specialty databases  large number of molecules and associated data  information on preparations for most compounds  links between compounds, reactions, and citations  synthetic pathways for known compounds through hyperlinks  efficient use of “generic” groups in searching  high-speed searches

  12. Coverage and Scope of Abstracting Example:J. Org. Chem.1995, 60(29) - 68 publications  CrossFireplusReactions (B)  MDL Reaction Databases (M) • 64 papers with 516 reactions, not abstracted: papers with focus on theoretical aspects and large biomolecules • 47 papers with 332 reactions, emphasis on synthetic methodologies  papers with coverage in B only (17=26%) • papers where synthetic methodology is NOT the main focus  papers abstracted equally by B and M (6=10%) • papers with strong synthetic focus • in general, M has slightly fewer examples, sufficient to indicate scope and limitations  papers with large B:M ratio of abstracted reactions (41=64%) • B abstracting is comprehensive (all preparations described in Exp. Part • reactions in M highlight critical or new methodologies with detailed information on stereochemistry, reaction conditions dependency etc.

  13. Molecule Data in the Beilstein Database (partial listing)

  14. Reaction Information and Hyperlinking in CrossFireplusReactions

  15. Information on Synthetic Methodology in MDL Databases

  16. Use of Complementary Information in Totalsynthesis of Complex Molecule Synthetic Target 1 Hypothetical Drug Candidate suggested by Dr. P. W. Erhardt, University of Toledo

  17. Substructure Queries and Search Results (initial results)

  18. Searches for Dimethylamination Reaction Substructure Search No relevant hits! Result: 3 hits Reaction Browser Transformation Search

  19. Dimethylamination(CrossFireplusReactions) Query: via the 4-nitroso derivative Result:

  20. Hyperlinking in CrossFireplusReactions Schematic Display of Hyperlinking

  21. Search for Nitration/Nitrosation with Reaction Browser (MDL) Result: 26 hits Clustering (reagent): 20 nitrosations, 6 nitrations

  22. Stability of Protecting Group (MDL Reaction Browser)

  23. Plan for Preparation of a Key Intermediate Suggested Preparation of Benzoquinolone Compound from Naphthyl Derivatives Search for Starting Material

  24. Key Intermediates in Potential Totalsynthesis

  25. The Essence of Solving Synthetic Problems “The design of organic syntheses by chemists without the help of computers proceeds in anything but a systematic stepwise manner from the target molecule to available starting materials. A systematic stepwise approach is more the exception than the rule”. “The human mind solves problems by lateral thinking, jumping from one idea to the next, from one question to a different one, from retrosynthetic thinking to considering the course and outcome of a reaction ,etc.” Gasteiger, J.; Ihlenfeldt, W.D.; Roese, P. Recl.Trav.Chim.Pays-Bas 1992, 111, 270.

  26. Substructure Searches – Not for the Enduser

  27. Requirements for Effective Reaction Retrieval by Enduser Chemist User interfaces have to be intuitive and adjustable to users’ tasks and capabilities (e.g. Scifinder, CrossFire Web, Reaction Browser) (see A Framework for the Evaluation of Chemical Structure Databases, Cooke, F; Schofield, H. J. Chem. Inf. Comput. Sci. 2001, 41, 1131-1140) Simplification of the querying process (natural not rule dependent) Effective indexing of databases (e.g. reaction classification) Hierarchical thesauri for keywords Efficient post-search management tools Seamless integration of various information sources (web environment, point-and-click)

  28. Reaction Classification Potential uses: • alternate method for indexing databases - complementary to structure keys used in retrieval systems • access to “generic” types of information in retrieval systems • post-search management of large hitlists • simplification of query generation • linking of reaction information from different sources • source for deriving knowledge bases for reaction prediction and synthesis design • automatic procedures for analyses and correlations, e.g. quality control and overlap studies

  29. References for Papers on Reaction Classification • Horace: An Automatic System for the Hierarchical Classification of Chemical Reactions. Rose, J.R., Gasteiger, J. J. Chem. Inf. Comput. Sci. 1994, 34, 74 • COGNOS: A Beilstein-Type System for Organizing Organic Reactions. Hendrickson, J.B., Sander, T. J. Chem. Inf. Comput. Sci. 1995, 35, 251 • Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network. Chen, L., Gasteiger, J. J. Am. Chem. Soc. 1997, 119, 4033 • Classification of Organic Reactions: Similarity of Reactions Based on Changes in the Electronic Features of Oxygen Atoms at the Reaction Sites. Satoh, H., Sacher, O., Nakata, T., Chen, L., Gasteiger, J., Funatsu, K. J. Chem. Inf. Comput. Sci. 1998, 38, 210 continued

  30. References for Papers on Reaction Classification • A Novel Method for Characterization of Three-Dimensional Reaction Fields Based on Electrostatic and Steric Interactions toward the Goal of Quantitative Analysis and Understanding of Organic Reactions. Satoh, H., Itono, S., Funatsu, K., Takano, K. Nakata, T. J. Chem. Inf. Comput. Sci. 1999, 39, 671 • Topology-Based Reaction Classification: An Important Tool for the Efficient Management of Reaction Information. Kraut, H., Loew, P., Matuszczyk, H., Saller, H., Grethe, G. Proceed. 5th Internat. Conf. Chem. Struct. Noordwijkerhout, The Netherlands 1999 • Reaction Classification. Hendrickson, J.B., Chen, L. In Encyclopedia of Computational Chemistry: von Rague Schleyer, P., Allinger, N.L., Clark, T., Gasteiger, J., Kollman, P.A., Schaefer III, H.F., Schreiner, P.R., Eds.; John Wiley & Sons: Chichester, 1988; Vol. 4, ppg 2381 – 2402 • A Framework for the Evaluation of Chemical Structure Databases Cooke, F.; Schofield, H. J. Chem. Inf. Comput. Sci. , 2001, 41, 1131-1140

  31. Major Study Areas of Reaction Classification

  32. HORACE Gasteiger and Rose J. Chem. Inf. Comput. Sci. 1994, 34, 74 • - data driven machine learning technique for finding an inherent hierarchy in a dataset • - automatic hierarchical classification in an iterative process considering reaction • centers and reaction context (reaction requirements and tolerance) • - based on physicochemical and topological features in a synergistic manner • - empirical methods for calculation of atomic charges, parameters for inductive, • resonance and polarizability effects, and local dissociation energies • - reaction center characterization restricted to features within one bond of center • - three levels of abstraction for supporting classification and generalization of reactions • partial order (hierarchy) of atoms for generalization (ten atomic equivalency classes) • set of 114 structural features (functional groups) for characterization of reactions with same reaction center (derived from BRANGÄNE set) • dynamic classification of functional groups (nonorthogonality) as determined by data • physicochemical classification features (selection dependent on generalized reaction) D S W Wittig-Horner -OR -CHO -COOR -CN -SO2R -PO(OR)2 Dynamic Classification of Functional Groups

  33. HORACE reaction scheme Physicochemical generalization Topological hierarchy . . . . . . . . . . . . . . . . . . Physicochemical classification . . . . . . . . . . . . . . . . . . Individual reactions Propagation of physicochemical constraints into the topological hierarchy

  34. COGNOS Hendrickson and Sanders: J. Chem. Inf. Comput. Sci.1995, 35, 251 - SYNGEN classification based on net changes in bonding at reaction center (skeletal carbon) and invoking dichotomy of skeleton and functionality as evidenced in construction/fragmentation and refunctionalization reactions - hierarchical classification with few main categories and a taxonomic nesting of sub- families - large indexed database(s) using a taxonomy of reaction types - use of delimiters - implemented in COGNOS for indexing of large databases and fast retrieval R -bond to another carbon (skeletal bond) -bond to another carbon (functional bond) Z bond (- or -) to electronegative heteroatom (N, O, S, halogen, etc.) H bond to hydrogen or electropositive atom (B, Al, Si, Sn, metal, etc.) SYNGEN descriptors: Number of bonds is defined as , , z, h with a sum of 4; oxidation state x=z-h (x < or = to ±4 1 2 3 Z  H 1 2 3 3 2 1 0100.0001.0001 (EDU) - 0001.0001.0000 -(PRO) 0001.0001.0100 (EDU) - 0000.0001.0001 - (PRO) 1: z=1 =0 z-value=0100 2,3:= =1 z-value=0001 0011.0000.0001  z-list (301) 0001.0000.0011  z-list (103)

  35. Landscaping Organic Reactions Gasteiger and Chen J. Am. Chem. Soc. 1997, 119, 4033 • - use of self-organizing neural network (Kohonen) • - two-dimensional refinement of reaction types from multi-dimensional data • - use of physicochemical variables for for description of reaction center • values calculated using PETRA and stored with reactions • charge distribution, inductive and resonance effects • infer reaction mechanism from data • account for long-range effects (e.g. through conjugation) - program allows the use of other variables, such as reaction conditions, reagents, catalysts, solvents etc. Example:(generating a knowledge base for reaction prediction)

  36. Landscaping Organic Reactions Note: HORACE found 30 clusters due to influence of topology of functional groups

  37. Reaction Classification - InfoChem Rules and Definitions RCP v.2. 5 developed by InfoChem, Munich Based on InfoChem’s reaction center perception algorithm • A bond is defined as a reaction center if it made or broken • An atom is defined as a reaction center if it changes • number of valencies • number of implicit and explicit hydrogens • number of -electrons • atomic charge • the connecting bond is a reaction center

  38. Reaction Classification - InfoChem Generation of Classification Data • Hashcodes are calculated for all reaction centers taking into account atom properties • atom type • valence state • total number of bonded hydrogens (implicit plus explicitly drawn) • number of -electrons • aromaticity • formal charge • reaction center information • The sum of all reaction center hashcodes of all reactants and one product of a reaction provides the unique reaction hashcode: ‘Reaction Classification Code’

  39. Reaction Classification - InfoChem General Rules • Parameters used for the generation of classification codes • inclusion of atoms in the immediate environment (spheres) • reaction centers only (0-sphere = BROAD) • reaction centers + -atoms (1-sphere = MEDIUM) • reaction centers + -atoms (2-sphere = NARROW) • inclusion of one sp3-atoms during sphere expansion • atom equivalency • atoms in the same group of the periodic table, with the exception of row-2 elements, are considered equivalent • multiple occurrences of identical transformations are handled as one

  40. Reaction Classification - InfoChem Definition of Classification Levels

  41. Reaction Classification - InfoChem Atom Equivalency Atoms in the same group of the periodic table, with the exception of row-2 elements, are treated as one entity; for example S, Se, Te are considered identical and as a group are different from O.

  42. Reaction Classification - InfoChem Multiple Occurences of Identical Transformations (ketone to alcohol)

  43. Reaction Classification - InfoChem Reaction Description • Hashcodes for the same reaction type are different for partially described reactions and complete ones. For multi-product reactions the number of hashcodes equals the number of products, each hashcode representing a single product reaction. Stereochemistry is not considered.

  44. Reaction Classification - InfoChem Isotope and Valence Changes • Change from one isotopic form into another involves breaking a bond therefore affecting the hashcode • Valence and charge changes also affect the hashcodes

  45. Reaction Classification as Post-Search Management Tool • Classification codes are data • stored in the database • usable for sorting (clustering) Result: 156 hits Clustered by Classification Code “MEDIUM) RSS-Search Query: (in red) 72 clusters 1.Cluster (20 rxns) 2.Cluster (15 rxns) 3.Cluster (13 rxns) 4.Cluster (8 rxns)

  46. Preparation of a Combinatorial Library

  47. Reaction Classification as Querying Tool Eliminates problem of drawing effective RSS-queries Calculates classification code of drawn reactions on-the-fly Retrieves all reactions of the same reaction type without the noise of normal RSS searches Query Form of MDL’s Reaction Browser: Result of ‘Same Transformation’ Search: 23 hits Examples:

  48. Linking / Integration of Reaction Information Integrated Major Reference Works (present status) Reaction Databases (ISIS/Host) (MDL, Third Party, Proprietary etc.) LitLink (citations) Reaction Classification Codes Reaction Classification Codes LitLink (citations) Tertiary Sources Primary Journals Major Reference Works (MRWs) (CAC, COFGT, EROS) iMRW links Future links

  49. Linking / Integration of Reaction Information Architecture and Design ISIS/Host 3.2 or higher Web Server iPlanet 4.1 SP5 Sun Solaris 7.0 & 8.0 iMRW InfoChem Web Search System iMRW iMRW RXN DBs each consisting of searchable electonic text (HTML) + reaction database SERVER CLIENT ISIS/Base 2.3 or higher Internet Browser NT 4.0 Windows 98 Windows 2000 Netscape 4.75 Internet Explorer 5.0 or higher Rxn Browser Add-in Plug-in

  50. Linking / Integration of Reaction Information • Simulating chemists’ approach of gathering information from various sources (lateral approach) for solving synthetic problems through a simple point-and-click mechanism • Assisting chemists with the synthesis of new compounds by providing complementary information • With examples for synthetic methodologies from reaction databases • From summaries, critically evaluated by experts, describing • reaction mechanisms • principles of stereo-controlled reactions • applications, preparations, and properties of reagents • and other information generally not found in reaction databases • Through one-click linking to the primary literature when combined with Litlink

More Related