Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and

Knowledge and Data in Computational Biological Discovery Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/~langley langley@csli.stanford.edu Thanks to V. Brooks, S. Klooster, A. Pohorille, C. Potter, K. Saito, M. Schwabacher, J. Shrager, and A. Torregrosa.

Motivations for Computational Discovery Humans strive to discover new knowledge from experience so that they can: better predict and control future events understand both previous and future events communicate that understanding to others Computational techniques should let us automate and/or assist this discovery process. Recent research on computational discovery has made progress on some of these issues but downplayed others.

The discovery process has been aided by three major advances: Three Revolutions The scientific revolution (~1700) introduced formalisms to describe and explain natural phenomena. The heuristicsearch revolution (~1957) introduced computer algorithms to automate problem solving. The data revolution (~1995) introduced collection of large data repositories for many domains. Different paradigms for computer-aided discovery focus on some developments more than others.

The Data Mining Paradigm One paradigm, often known as datamining or KDD, can be best characterized as: emphasizing the availability of vast amounts of data; drawing on computational heuristic search to find regularities in these data; using formalisms like decision trees, association rules, and Bayes nets to describe those regularities. Thus, most KDD researchers favor their own formalisms over those used by scientists and engineers. As a result, their discoveries are seldom very communicable to members of those communities.

The Scientific Discovery Paradigm A second paradigm, computational scientific discovery, can be characterized as: drawing on computational heuristic search to find regularities in scientific data, either historical or novel; using formalisms like numeric laws, structural models, and reaction pathways to describe regularities. Thus, researchers in this framework favor representations used by scientists and engineers. As a result, their system’s discoveries are more communicable to members of those communities.

1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDSN Hume, ARC DST, GPN LaGrange SDS SSF, RF5, LaGramge AM Glauber NGlauber IDSQ, Live RL, Progol HR Dendral Dalton, Stahl Stahlp, Revolver Gell-Mann BR-3, Mendel Pauli BR-4 IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GPM Numeric laws Qualitative laws Structural models Process models Time Line for Research on Computational Scientific Discovery Legend

Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: • stellar taxonomies from infrared spectra (Cheeseman et al., 1989) • qualitative chemical factors in mutagenesis (King et al., 1996) • quantitative laws of metallic behavior (Sleeman et al., 1997) • quantitative conjectures in graph theory (Fajtlowicz et al., 1988) • temporal laws of ecological behavior (Todorovski et al., 2000) • reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997) Each of these has led to publications in the refereed literature of the relevant scientific field.

We aim to extend previous approaches to computational discovery of communicable knowledge by: Research Themes focusing on domains that involve temporal and spatial data generating explanations that involve hidden objects/variables drawing on domain knowledge to constrain the search process developing interactive discovery tools for use by scientists As in earlier work, our notation for discovered knowledge will be the same as that used by experts in the domain. Within these guidelines, we are open to any search algorithm that can produce such communicable knowledge.

Some Interesting Ecological Questions • What environmental variables determine the production of carbon and the generation of various gases? • What functional forms relate these predictive variables to the ones they influence? • How do extreme values of these variables affect behavior of the ecosystem? • Are the Earth ecosystem parameters constant or have values changed in recent years?

NPP S_LEAF M_LEAF M_ROOT S_ROOT LEAF_MIC MIN_N SOIL_MIC Given: A model of Earth’s ecosystem (CASA) stated as difference equations that involve observable and hidden variables. The Task of Ecological Modeling Given: Values of observable variables (rainfall, sunlight, NPP) as they change over both time and space. Given: Inferred values for global parameters and intrinsic properties associated with discrete variables (e.g., ground cover). Find: A revised ecosystem model with altered equations and/or parametric values that better fits the data.

The NPPc Portion of CASA NPPc = Smonthmax (E·IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt2 T2 = 1.18 / [(1 + e0.2 · (Topt – Tempc – 10) ) · (1 + e0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI)A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI3 – 0.000077 · AHI2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG) , 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

The NPPc Portion of CASA NPPc E IPAR e_max W T2 T1 SOLAR FPAR A PET EET Topt SR AHI PETTWM Tempc NDVI VEG

Saito and Nakano (2000) describe RF6, a discovery system that: The RF6 Discovery Algorithm 1. Creates a multilayer neural network that links predictive with predicted variables using additive and product units. 2. Invokes the BPQ algorithm to search through the weight space defined by this network. 3. Transforms the resulting network into a polynomial equation of the form y = S ciP x jd ij . They have shown this approach can discover an impressive class of numeric equations from noisy data.

This suggests an approach to revising the NPPc model to better fit the observed data: Improving the NPPc Portion of CASA 1. Transform the NPPc model into a multilayer neural network that predicts the same behavior. 2. Identify portions of the NPPc model that are likely candidates for improvement. 3. Run the RF6 algorithm to revise those portions of the model (e.g., specified parameters or equations). 4. Transform the revised multilayer network back into numeric equations using the improved components.

Three Facets of Model Revision We have adapted RF6 to revise an existing quantitative model in three distinct ways: • Altering the value of parameters in a specified equation; • Changing the associated values for an intrinsic property; and • Replacing the equation for a term with another expression. Rather than initializing weights randomly, the system starts with weights based on parameters in the original model. We have applied this strategy to improve three different portions of the NPPc submodel.

Altering Parameters in the NPPc Model Initial model: T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] Cross-validated RMSE = 467.910 Behavior: Gaussian-like function of temperature difference. Revised model: T2 = 1.80 / [(1 + e 0.05 · (Topt – Tempc – 10.8) ) · (1 + e 0.3 · (Tempc – Topt – 90.33) )] Cross-validated RMSE = 461.466 [ one percent reduction ] Behavior: nearly flat function in actual range of temperature difference. Conclusion: The T2 temperature stress term contributes little to the overall predictive ability of the NPPc submodel.

The NPPc submodel includes one intrinsic property, SR, associated with the variable for vegetation type, UMD-VEG. The corresponding RF6 network includes one hidden node for SR and one dummy input variable for each vegetation type. Veg type A B C D E F G H I J K Initial 3.06 4.35 4.35 4.05 5.09 3.06 4.05 4.05 4.05 5.09 4.05 Revised 2.57 4.77 2.20 3.99 3.70 3.46 2.34 0.34 2.72 3.46 1.60 RMSE = 467.910 for the original model; RMSE = 448.376 for the revised model, an improvement of four percent. Observation: Nearly all intrinsic values are lower in the revised model. Revising Intrinsic Values in the Model

Initial model: E = 0.56 · T1 · T2 · W Cross-validated RMSE = 467.910 Behavior: Each stress term decreases the photosynthetic efficiency E. Revised model: E = 0.521 · T10.00 · T2 0.03 · W 0.00 Cross-validated RMSE = 446.270 [ five percent reduction ] Behavior: T1 and W have no effect on E and T2 has only a minor effect . Conclusion: The stress terms are not useful to the NPPc model, most likely because of recent improvements in NDVI measures. Revising Equations in the NPPc Model

Future Work on Ecological Modeling • Apply revision method to other parts of NPPc submodel and other static parts of CASA model. • Extend revision method to improve parts of CASA that involve difference equations. • Develop software for visualizing both spatial and temporal anomalies, as well as relating them to model. • Implement an interactive system that lets scientists direct high-level search for improved ecosystem models.

We can easily plot an improved model’s errors in spatial terms. Visualizing Errors in the Model Such displays can help suggest causes for prediction errors and thus ways to further improve the model.

Some Interesting Biological Questions • How do organisms acclimate to increased temperature or ultraviolet radiation? • Why do we observe bleaching of plant cells under high light conditions? • What differences in biological processes exist between a mutant organism and the original? • What are the effects on an organism’s biological processes when one of its important genes is removed?

Modeling Results in Microarrary Experiments Given: Qualitative knowledge about an organism’s reactions and regulations for some environmental setting. Given: A mutated organism with different macroscopic behavior in that environmental setting. Given: Observed expression levels, over time, of the mutant’s enzymes in the setting. Find: A revised model with altered reactions and regulations that explains the expression levels.

Modeling Microarrary Results on Photosynthesis Given: Qualitative knowledge about reactions and regulations for Cyanobacteria in a high ultraviolet situation. Given: A mutated strain of Cyanobacteria that does not bleach when exposed to high ultraviolet light. Given: Observed expression levels, over time, of the mutant’s enzymes in the presence of high ultraviolet light. Find: A revised model with altered reactions and regulations that explains the expression levels and the failure to bleach.

A Model of Photosynthesis Regulation Why do plants modify their photosynthetic apparatus in high light? Degradation of psaF,psaA,psaB nblB nblA nblR HL -N -S -P -Cl nblS Survival in High Light RR cpcX hliA psbx ... Modification of Photosynthesis Blue/UV-A Photoreceptor

Collecting Data on Photosynthetic Processes www.affymetrix.com/ MicroArray Trace /wwwscience.murdoch.edu.au/teach Continuous Culture (Chemostat) Stress (e.g., High Light) Adaptation Period Sampling mRNA/cDNA Health of Culture Equlibrium Period www.affymetrix.com/ Time

Microarray Data on Photosynthetic Regulation

Six Steps in Revising Regulation Models Our approach to revising an existing model involves six steps: 1. Generate candidate models with a single process removed. 2. Predict qualitative correlations between enzymes for each model. 3. Calculate the observed correlations between enzymes over time. 4. Measure the percentage of correct predictions for each model. 5. Select the revised model with the highest predictive accuracy. 6. Repeat this strategy until no revision leads to improvement. Thus, our system carries out heuristic search through the space of models, guided by candidates’ abilities to explain the data.

Heuristic Search Through a Space of Models Initial model Revision 1.1 Revision 1.2 Revision 1.3 Revision 1.4 Revision 2.1 Revision 2.2 Revision 2.3 Revision 2.4 Revision 3.1 Revision 3.2 Revision 3.3 Revision 3.4

A Revised Model of Photosynthesis Regulation The mutant is NblR deficient, so it does not down regulate NblA/B. Degradation of psaF,psaA,psaB nblB nblA X nblR HL -N -S -P -Cl nblS Survival in High Light RR cpcX hliA psbx ... Modification of Photosynthesis Blue/UV-A Photoreceptor

Observed and Predicted Correlations Observed: Original: Revised: nblS,nblR + nblS,nblA × nblS,nblB × nblS,psaF × nblS,psaA × nblS,paaB × nblR,nblA × nblR,nblB × nblR,psaF × nblR,psaA × nblR,psaB × nblA,psaF + nblA,psaA + nblA,psaB + nblA,psaF + nblA,psaA + • • • nblS,nblR + nblS,nblA + nblS,nblB + nblS,psaF + nblS,psaA + nblS,paaB + nblR,nblA + nblR,nblB + nblR,psaF + nblR,psaA + nblR,psaB + nblA,psaF + nblA,psaA + nblA,psaB + nblA,psaF + nblA,psaA + • • • nblS,nblR + nblS,nblA × nblS,nblB × nblS,psaF × nblS,psaA × nblS,paaB × nblR,nblA × nblR,nblB × nblR,psaF × nblR,psaA × nblR,psaB × nblA,psaF + nblA,psaA + nblA,psaB + nblA,psaF + nblA,psaA + • • •

Future Work on Biological Modeling • Add more knowledge about photosynthetic pathways and use to interpret additional microarray data. • Incorporate ability to introduce new regulation influences in addition to removing existing ones. • Expand modeling formalism to include abstract processes like signal transduction and allosteric modulation. • Implement an interactive system that lets scientists direct high-level search for improved biological process models.

In summary, unlike work in the data mining paradigm, our research on computational discovery: Concluding Remarks attempts to move beyond description and prediction to both explanation and understanding; uses domain knowledge to initialize search and to characterize differences from revised model; presents the new knowledge in some communicable notation that is familiar to domain experts. Such techniques will improve the way we manipulate, utilize, and understand complex scientific and engineering data.

Improving the Prediction of NDVI The Normalized Difference Vegetative Index (NDVI) is a central part of CASA that is measured by satellite sensors. Unfortunately, NDVI is only available for the years since 1983, when satellites with these sensors were launched. Potter and Brooks (1998) report a predictive model of NDVI that is a piecewise linear function of temperature, rainfall, and moisture. We hoped to improve this model using Cubist, which induces a set of regression rules from continuous data.

Form of the CASA NPPc Data . . . Grid 360,360 Grid 1,1 January February March April May June July August September October November December NPPc Temp Topt EET PET NDVI AHI Veg

Cubist produced a revised NDVI model with five piecewise linear components rather than two, all based on rainfall. An Improved Piecewise Linear Model This model explains 88% of the variance, compared with 74% of the variance for the Potter and Brooks model.

One way to visualize the model involves plotting rules spatially. Visualizing the Improved Model Our Earth science collaborators found this useful, as regions often correspond to recognizable ecological zones.

Given: Knowledge about the metabolism of an organism stated as biochemical reactions. The Task of Metabolic Modeling Given: Observed environmental situations and expression levels of enzymes from microarrays. Find: A complete metabolic model that explains the observed expression levels. Acetoacetyl-CoA Acetyl-CoA EC4.1.3.5 EC4.1.3.4 Intermediate EC2.8.3.5 Acetoacetate

Five Steps in Metabolic Model Revision Our general approach to metabolic modeling involves six steps: 1. Represent biochemical reactions known for the organism. 2. Find complete metabolic pathways through heuristic search. 3. Order metabolic pathways using matches to microarray data. 4. Simulate natural or experimental knockouts of genes/enzymes. 5. Propose bridging reactions that explain the observed behavior. 6. Order reactions using reaction analogy and DNA sequences. We will illustrate these steps with an example from glycolysis and the TCA cycle.

Step 1. Represent Biochemical Reactions

Step 1. Represent Biochemical Reactions CYTOSOLIC:glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP CYTOSOLIC:1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP MITOCHONDRIAL:isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2 MITOCHONDRIAL:succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA

Step 2. Find Pathways by Heuristic Search Target = Malate Solution for Fructose environment fructose ---[Fructokinase]--> fructose 1-phosphate fructose 1-phosphate ---[Fructose 1-phosphate aldolase]--> glyceraldehyde + dihydrozyacetone phosphate dihydrozyacetone phosphate ---[Isomerase]--> glyceraldehyde 3-phosphate phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP 3-phosphoglycerate ---[Phosphoglyceromutase]--> 2-phosphoglycerate 2-phosphoglycerate ---[Enolase]--> phosphoenolpyruvate + H2O phosphoenolpyruvate + ATP ---[Pyruvate kinase]--> pyruvate + ADP malate + NAD+ ---[Malate dehydrogenase]--> oxaloacetate + NADH + H+ pyruvate + NAD+ + CoA ---[NIL]--> NADH + H+ + Co2 + acetyl CoA acetyl CoA + oxaloacetate ---[Citrate synthase]--> citrate + CoA citrate ---[Aconitase]--> isocitrate isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2 a-ketoglutarate + NAD+ + CoA ---[a-ketogluterate dehydrogenase complex]--> succinyl CoA + NADH + H+ + Co2 succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA succinate + FAD ---[Succinate dehydrogenase]--> fumarate + FADH2 fumarate + H2O ---[Fumerase]--> malate Solution for Glucose environment glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP glucose 6-phosphate ---[Phosphoglucomutase]--> fructose 6-phosphate fructose 6-phosphate + ATP ---[Phosphofructokinase]--> fructose 1,6 bisphosphate + ADP fructose 1,6 bisphosphate ---[Aldolase]--> dihydrozyacetone phosphate + glyceraldehyde 3-phosphate phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP [...same as above from this point onward...]

Step 3. Order Pathways by Likelihood Given Data www.affymetrix.com/ fructose ---[Fructokinase]--> fructose 1-phosphate fructose 1-phosphate ---[Fructose 1-phosphate aldolase]--> glyceraldehyde + dihydrozyacetone phosphate dihydrozyacetone phosphate ---[Isomerase]--> glyceraldehyde 3-phosphate phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> NADH + H+ + 1,3-bisphosphoglycerate 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP 3-phosphoglycerate ---[Phosphoglyceromutase]--> 2-phosphoglycerate 2-phosphoglycerate ---[Enolase]--> phosphoenolpyruvate + H2O phosphoenolpyruvate + ATP ---[Pyruvate kinase]--> pyruvate + ADP malate + NAD+ ---[Malate dehydrogenase]--> oxaloacetate + NADH + H+ pyruvate + NAD+ + CoA ---[NIL]--> NADH + H+ + Co2 + acetyl CoA acetyl CoA + oxaloacetate ---[Citrate synthase]--> citrate + CoA citrate ---[Aconitase]--> isocitrate isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2 a-ketoglutarate + NAD+ + CoA ---[a-ketogluterate dehydrogenase complex]--> succinyl CoA + NADH + H+ + Co2 succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA succinate + FAD ---[Succinate dehydrogenase]--> fumarate + FADH2 fumarate + H2O ---[Fumerase]--> malate

Step 4.Simulate Natural or Experimental Knockouts glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP glucose 6-phosphate ---[Phosphoglucomutase]--> fructose 6-phosphate fructose 6-phosphate + ATP ---[Phosphofructokinase]--> fructose 1,6 bisphosphate + ADP fructose 1,6 bisphosphate ---[Aldolase]--> dihydrozyacetone phosphate + glyceraldehyde 3-phosphate phosphatate + NAD+ + glyceraldehyde 3-phosphate ---[Triose phosphate dehydrogenase]--> 1,3-bisphosphoglycerate 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP 3-phosphoglycerate ---[Phosphoglyceromutase]--> 2-phosphoglycerate 2-phosphoglycerate ---[Enolase]--> phosphoenolpyruvate + H2O phosphoenolpyruvate + ATP ---[Pyruvate kinase]--> pyruvate + ADP malate + NAD+ ---[Malate dehydrogenase]--> oxaloacetate + NADH + H+ pyruvate + NAD+ + CoA ---[NIL]--> NADH + H+ + Co2 + acetyl CoA acetyl CoA + oxaloacetate ---[Citrate synthase]--> citrate + CoA citrate ---[Aconitase]--> isocitrate isocitrate + NAD+ ---[Isocitrate dehydrogenase]--> a-ketoglutarate + NADH + H+ + Co2 a-ketoglutarate + NAD+ + CoA ---[a-ketogluterate dehydrogenase complex]--> succinyl CoA + NADH + H+ + Co2 succinyl CoA + GDP + phosphatate ---[Succinyl CoA synthase]--> succinate + GTP + CoA succinate + FAD ---[Succinate dehydrogenase]--> fumarate + FADH2 fumarate + H2O ---[Fumerase]--> malate Knockout: 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP

Step 5.Propose Bridging Reactions Abstract Chemicial Knowledge Constrained Search + glucose + ATP ---[Hexokinase]--> glucose 6-phosphate + ADP ATP ADP 3 Phosphates 2 Phosphates 6 Carbons 0 Phosphates 6 Carbons 1 Phosphate Abstract Balance

Step 5.Propose Bridging Reactions Knockout: 1,3-bisphosphoglycerate + ADP ---[Phosphoglycerate kinase]--> 3-phosphoglycerate + ATP 25 plausible (single) “bridging” reactions are proposed: <CYTOSOLIC:glyceraldehyde 3-phosphate ---[]--> 3-phosphoglycerate> <CYTOSOLIC:dihydrozyacetone phosphate ---[]--> 3-phosphoglycerate> <CYTOSOLIC:fructose 1,6 bisphosphate ---[]--> phosphoenolpyruvate + 3-phosphoglycerate> <CYTOSOLIC:fructose 1,6 bisphosphate ---[]--> 2-phosphoglycerate + 3-phosphoglycerate> <CYTOSOLIC:fructose 1,6 bisphosphate ---[]--> 3-phosphoglycerate + 3-phosphoglycerate> <CYTOSOLIC:ATP + fructose 1,6 bisphosphate ---[]--> ADP + 1,3-bisphosphoglycerate + 3-phosphoglycerate> <CYTOSOLIC:fructose 1,6 bisphosphate ---[]--> glyceraldehyde 3-phosphate + 3-phosphoglycerate> <CYTOSOLIC:fructose 1,6 bisphosphate ---[]--> dihydrozyacetone phosphate + 3-phosphoglycerate> <CYTOSOLIC:ADP + frucose 1,6 bisphosphate ---[]--> ATP + Co2 + acetyl + 3-phosphoglycerate> <CYTOSOLIC:ADP + 1,3-bisphosphoglycerate ---[]--> ATP + 3-phosphoglycerate> <CYTOSOLIC:ADP + fructose 1,6 bisphosphate ---[]--> ATP + pyruvate + 3-phosphoglycerate> <CYTOSOLIC:ADP + fructose 1,6 bisphosphate ---[]--> ATP + glycerate + 3-phosphoglycerate> <CYTOSOLIC:ADP + fructose 1,6 bisphosphate ---[]--> ATP + glyceraldehyde + 3-phosphoglycerate> <CYTOSOLIC:ADP + fructose 1,6 bisphosphate ---[]--> ATP + dihydroxyacetone + 3-phosphoglycerate> <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + phosphoenolpyruvate + 3-phosphoglycerate> <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + 2-phosphoglycerate + 3-phosphoglycerate> <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + 3-phosphoglycerate + 3-phosphoglycerate> <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + glyceraldehyde 3-phosphate + 3-phosphoglycerate> <CYTOSOLIC:ATP + glucose 6-phosphate ---[]--> ADP + dihydrozyacetone phosphate + 3-phosphoglycerate> <CYTOSOLIC:glucose 6-phosphate ---[]--> Co2 + acetyl + 3-phosphoglycerate> <CYTOSOLIC:glucose 6-phosphate ---[]--> pyruvate + 3-phosphoglycerate> <CYTOSOLIC:glucose 6-phosphate ---[]--> glycerate + 3-phosphoglycerate> <CYTOSOLIC:glucose 6-phosphate ---[]--> glyceraldehyde + 3-phosphoglycerate> <CYTOSOLIC:glucose 6-phosphate ---[]--> dihydroxyacetone + 3-phosphoglycerate> <CYTOSOLIC:glucose + ATP ---[]--> 1,3-bisphosphoglycerate + 3-phosphoglycerate>

Step 6. Order Bridging Reactions by Likelihood Homology of hexokinase across species: www.bio.davidson.edu/Biology We also measure similarity in structure between each bridging reaction and the knocked out reaction.

Microarray Data on Photosynthetic Regulation

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and