150 likes | 159 Views
Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha. New York University. Courant Institute of Math & Computer Sciences. Department of Biology. Gloria Coruzzi Mike Chou Andrew Kouranov Laurence Lejay. Dennis Shasha. Bud Mishra
E N D
Activist Data Mining (as Applied to Carbon:Nitrogen Sensing in Plants) Dennis Shasha
New York University Courant Institute of Math & Computer Sciences Department of Biology Gloria Coruzzi Mike Chou Andrew Kouranov Laurence Lejay Dennis Shasha Bud Mishra Marco Antoinotti Marc Rejali
LIGHT Photosynthesis Glu Sugar Gln AminoAcids Asp NH4+ Asn
Light, Carbon and Amino acids differentially regulate N-assimilation genes Carbon Carbon Light Light Amino acids GS2 AS1 Amino acids Asn C4:N2 Gln C5:N2 C:N C:N
Goal: Figure out the Circuit for many genes • A Multi-factor Approach to C:N sensing in plants. • Identify how a combination of interactions of “inputs” (Light, Carbon, & Nitrogen) affects gene regulation using Combinatorial Design and Genome Chip analysis. Identify Arabidopsis mutants defective in C:N sensing Forward genetics: Selections for C:N sensing mutants Reverse genetics: Mutants in candidate C:N signaling genes Ultimate Goal: Virtual plant… (frankenfoods)
A Combinatorial Approach to discovering interactions • Inputs: *Light • *Starvation to Various Nutrients • *Carbon • *Inorganic N (NO3/NH4) • *Organic N (Glu) • *Organic N (Gln) If inputs are take binary values (first approximation) 6 binary (+/-) inputs= 26 or 64 input combinations (or treatments) Use combinatorial design to reduce number of treatment combinations required to effectively cover the experimental space
ACTIVIST DATA MINING Don’t study the experiments (only). Change them. Combinatorial design generates a subset of the 64 treatments that give “good” approximation of the entire experimental space. For every pair of “inputs”, all four combinations of binary variables are tested: Example; NO3 and Carbon have four possible combinations +NO3 +Carbon; +NO3 -Carbon; -NO3 +Carbon; -NO3 -Carbon Each combination of inputs is present in at least one treatment of experiments predicted by combinatorial design
“Combinatorial design” predicts 12 conditions to test the effect of Light in all combinations of Starvation, Carbon, and Nitrogen
“Pivot” analysis of gene expression data from C:N treatments Find “minimal pairs” of treatments that are the same except in one input (e.g. Light) to measure its effect on a dependent variable (gene) (e.g. AS1) Analyze a series of minimal pair treatments using one input (e.g. Light) as a “pivot”, to determine the effect of light on a dependent variable (e.g. AS1) under a variety of carbon and nitrogen combinations. If consistent, likely always true.
LITE represses AS1 & induces GS2 under a variety of C:N conditions
GLU induces AS1 & represses GS2 under a variety of conditions
Underlying Method: combinatorial design Combinatorial design: Inspired by work in software testing by David Cohen, Siddhartha Dalal, Michael Fredman and Gardner Patton at Bellcore/Telcordia. Their problem: how to test a good set of inputs to a program to discover whether there are any bugs. Not program coverage, but input coverage. Not all input combinations, but all combinations of every pair of of input variables. Hypothesis: every input combination should give same output: no error. If true for designed subset, then program is ok.
Underlying Method: combinatorial design 2 Scientific question: does input X induce (resp. repress) the output? If so, then, regardless of the other inputs, X should induce. So, choose X = low and then a combinatorial design of the other inputs. Then choose X = high and then the same combinatorial design of the other inputs. If for each context c in the design (high,c) has more output than (low,c) -- minimal pair -- then X is inductive.
Underlying Methods: adaptive design What happens when X isn’t uniformly inductive or repressive? Suppose X shows induction normally, but repression occasionally. That is for most c values (low, c) vs. (high, c) shows induction, but for one c’ (low,c’) vs. (high, c’) shows repression. Then study difference between those c values showing induction that are closest to c’ and design experiments to reduce those differences.
Conclusions About Methodology Design/don’t wait: Use the data you are given, sure, but don’t be shy to ask for more. Combinatorial Design can help test a hypothesis: e.g. 10 three-valued variables require 59,049 experiments to cover whole space. Combinatorial design can reduce this to 27. Adaptation is easy: Study differences between normal cases and abnormal ones to discover fine structure.