200 likes | 219 Views
Explore the complexities of chemistry mechanisms and basic problems in the Mechem system, understanding the rules and patterns in molecules. Discover insights into chemical reactions and the challenges of automating the process.
E N D
Lecture 6: The Mechem System CSC 599: Computational Scientific Discovery
Overview Domain: Chemistry Mechanisms Basic Mechem Problems with Basic Mechem Mechem with TEST-STEP and INFER-STRUCTURES
The Domain Chemistry is a unique science: • The “rules” of the science are worked out (we think!) • Stoichiometry: conservation of mass/energy/charge • Valences of atoms • Patterns in molecules • Benzene rings • Complexity due to combinatorics • Infinite number of possible organic compounds • Many possible chemical mechanisms
Example Mechanism 6CH4 + 3O2 -> 5H2+CH3CH3+H2O+CH3OH+CH2O+CO+CO2 • CH4 + M -> CH3M + H • CH3M -> M + CH3 • O2 + CH3M -> CH3O2M • 2H -> H2 • CH3 + CH3M -> M + CH3CH3 • H + CH3O2M -> H2O + CH2OM (Not finished yet!)
Example Mechanism (2) 6CH4 + 3O2 -> 5H2+CH3CH3+H2O+CH3OH+CH2O+CO+CO2 • CH3 + CH3O2M -> CH3OH + CH2OM • CH2OM -> M + CH2O • CH3O2M -> H + CH2O2M • CH2O + CH2O2M -> CH3O2M + CHO • CH2O2M + CHO -> CH3O2M + CO • CO + CH2O2M -> CH2OM + CO2 (Whew!)
Example Mechanism (3) Let's step back and look at what we did: • It's “easy” because: • Apply rules of chemistry to posit legal steps • Assemble legal steps to accomplish overall reaction • It's “hard” because: • Combinatorics of all the reactions that COULD have been done • Why not, for example: 2CH3O2M -> CH3O4CH3O (chemists might think unlikely) • How would you search space of mechanisms? • Is it worth automating?
Basic Mechem: the Big Picture MECHEM searches space of reactions • Exhaustive search • From simplest to increasingly more complex Inputs: all reagents, at least one product Outputs: first (i.e.) simplest mechanism from reagents to product(s) MECHEM will: • Search space of reaction pathways • Recall DENDRAL searched space of chemical structures
Basic MECHEM, Issues • What is “simplicity” for mechanisms? • Number of “species” (atoms, molecules, or radicals) allowed • Number of reaction steps • Space is combinatorially huge! • Use rules of chemistry to limit search • Each reaction may have at most 2 reactants and 2 products • Disallow violations like C2H6O-1 • search formulas, not structures C2H6O can be either Ethyl alcohol: CH3CH2O Dimethyl ether: CH3OCH3
MECHEM Algorithm (1) findPathways(reagents, prods) { for ( maxSpeciesCount = reagents.count() + prods.count(); ; maxSpeciesCount++, prods.addNewVariable() ) { sequenceList.setToEmpty(); extendNum = ceiling((maxSpeciesCount-num(reagents))/2); do { sequenceList=sequenceList.extendBy(extendNum,reagents,prods); if ( sequenceList.hasSolution() ) return( sequenceList.getSolution() ); extendNum = 1; } while ( !sequenceList.isEmpty() ) } }
MECHEM Algorithm (2) sequenceList::extendBy(downCount, reactants,prods) { if (downCount == 0) return(this.sequenceList); newSequenceList.setToEmpty(); for ( seq in this.sequenceList) do for ( react in reactants) do for (prod in prods) do { newSeqence = seq.append(makeStep(react,prod)); newSeqence.inferAndInstantiateVarValues(); if ( newSequence.getIsLegal() ) newSequenceList.add(newSequence); } return(newSequenceList.extendBy(downCount-1,reactants,prods)); }
Example: n1(C7H9N) + n2(CH2O) --> n3(C17H18N2) + n4(H20) maxSpeciesCount = 6; initMaxReactionSteps = 3 • Initialize reactions as half steps: CH2O -> . . . C7H9N -> . . . 2CH2O -> . . . 2C7H9N -> . . . CH2O + C7H9N -> . . . (RECALL: 1 and 2 reagent reactions only) • Reject full step reactions that are illegal • Example: Reject C7H9N -> H2O
Example (2): Legal Whole Steps • CH2O -> H2O + X • CH2O -> X + Y • 2(CH2O) -> X • 2(CH2O) -> H2O + X • 2(CH2O) -> X + Y • 2(C7H9N) -> X • 2(C7H9N) -> X + Y • CH2O + C7H9N -> X • CH2O + C7H9N -> H2O + X • CH2O + C7H9N -> X + Y • C7H9N -> X + Y
Legal 2 Step Reactions Only 2 reagents, only 2 reactions • 91 legal two-step reactions (Yikes!) Here are the 4 that can be extended: CH2O + C7H9N -> H2O + X CH2O + X -> H2O + Y CH2O + C7H9N -> H2O + X CH2O + X -> Y CH2O + C7H9N -> H2O + X 2X -> C7H9N + X CH2O + C7H9N -> H2O + X 2X -> Y
Basic MECHEM analysis Found 5 solutions, 6 steps each Another example: C3H8 (propane) + O2 (oxygen) -> C2H4O2 (acetic acid) + C3H6O (acetone) + C3H8O (isopropanol) + C2H4O (acetaldehyde) + C4H8O2 (ethyl acetate) + CH4O (methanol) + CO2 (carbon dioxide) • 16 pathways, 10 species, 6 steps • Several hours on DecStation 3100 (14 MIPS) Yikes!
What would you do? 10 species, 6 steps --> several hours That reaction ain't all that big How would you speed it up? Other forms of knowledge to reduce search? (What information did we ignore in Basic Mechem?)
What would you do? 10 species, 6 steps --> several hours That reaction ain't all that big How would you speed it up? Other forms of knowledge to reduce search? (What information did we ignore in Basic Mechem?) • Search formulas, not structures BINGO!
Mechem with Structural Help Idea: Keep track of structure of molecules and radicals Heuristics: Let N = max. number of (topological) bonds created/destroyed By default, for all steps N <= 3
Mechem with Structural Help (2) Example • Notation: [(#Cs) (#Hs) (#Os) (#Ms)] M = “metal” (catalyst) • Reaction has: . . . . W[1 3 2 1] -> V[0 1 0 0] + Y[1 2 2 1] . . . . With N = 2 No possible Y With N = 3 Y could be 2 things: MCH2OO or MCHOOH With N = 4 Y could be 18 things! • Handles 12 step reactions (hrs on Si Graphic Indigo)
Other Ideas for Speeding Mechem • Max # of atoms/element, oxidation state, etc. • Bi-directional search • List of “compounds” that should not appear • Can do spectroscopy to see if they really do occur, even if transient • A.I. notion of “easy” chemist notion of “easy” • AI. researcher: “Minimize number of steps!” • Chemists: “Minimize energy of rate determining step!” • Compare “cost” of bond-breakings • Mechem (rxn step proposer) + ChemNet (reaction network)
Take Home Message: About Combinatorial Search • Finds “all” solutions? Finds “the best” solution? Yes, when scientists' notion of “simplest” == CSD researcher's notion of “simplest” • Takes forever! / only for toy problems Use domain knowledge Domain knowledge usage • Started simply (initially no structures) • Used more when needed to solve harder problems (structure knowledge -> solve bigger problems) • Eventually added so much knowledge that architecture changed (time to redesign algorithm?) • HOW WOULD YOU DO BETTER THAN MECHEM?