CSC 599: Computational Scientific Discovery

Lecture 8: Lagramge (sp) and Inductive Process Modeling CSC 599: Computational Scientific Discovery

Outline Review • Processes • BACON • Other equation finders • The act of equation finding Lagramge (sp)‏ Inductive Process Modeling

Review: Processes Deal with changes over time • Range (start, finish), (start, duration)‏ • Rates of change dx/dt • Previous history attribute[event[t]] = f(event[1], event[2], . . . event[t-1])‏ Attributes of processes (when time is included)‏ • Quantity of “always on” forces as function of time • Gravity, electro-magnetism • Maximum limit of “homeostatic” forces • Friction, Normal force • Misc. changes during process

Review: BACON BACON was driven by the data and domain knowledge: • Distinguishing between independent and dependent attribute • BACON 3 • Discovery of intrinsic properties of conceptual values • BACON 4 • Preference for symmetric equations given knowledge that suggests one could exist • BACON 5

Equation Finders, 1990s [From Ljupco Todorovski's Homepage:] • http://www-ai.ijs.si/~ljupco/ • COPER [Kokar 1986] • Uses information about the dimension units of the system variables to restrict the space of possible equation structures. • BACON [Langley 1987] • Pioneer among equation discovery systems. It uses a set of data-driven heuristics for finding regularities (constancies and trends) in data and for formulating hypotheses based on them.

Equation Finders, 1990s (2)‏ • Fahrenheit/EF[Langley and Zytkow 1989], [Zembowitz and Zytkow 1992] • Used as a equation discovery subsystem of the scientific discovery system. For discovering bivariate equations only, user being able to specify the set of operators and functions to be used within equations. • ABACUS [Falkenhainer and Michalski 1990] • Experiments with different search strategies through the space of equation structures. Also allows discovery of piecewise equations using clustering for identifying the limits between pieces. • IDS [Nordhausen and Langley 1990] • ARC [Moulet 1992]

Equation Finders, 1990s (3)‏ • E* [Schaffer 1993] • Discovers of bivariate equations using a small set of predefined equation structures. • LAGRANGE [Dzeroski and Todorovski 1995] • Handled differential equations. • GOLDHORN [Krizman et al. 1995] • Extended LAGRANGE towards discovery from noisy data. • SDS [Washio and Motoda 1997] • Used information about scale types of the dimension units of the system variables to restrict the space of possible equations. • LAGRAMGE [Todorovski and Dzeroski 1997] • Allowed the user to specify the space of possible equations with context free grammar.

Issues with equation finders • How to incorporate time Esp. derivatives • How separate • Heuristics used to search eqn space • Grading fnc used to determine how well eqn fits: • Compare: • True function = x2 • Current function = -x2 • Large “error” in numeric space • Small “error” in symbolic (eqn space)‏ • How best to incorporate domain knowledge? • BACON programs did so, but was ad hoc

Lagrange: The Person • Italian/French • Mathematician and physicist • 1736-1813 The System • Mid-1990s equation finder • Found differential equations • Dzeroski and Todorovski 1995

Lagramge (sp)‏ Lagramge = Lagrange (system) + grammar Handling the issues: • How to account for time? • Each vector of variable values can represent the state of the system at a particular time • Throw both attr and d[attr]/dt in variable set • Separating • Heuristics for searching equation space Explicit user-defined function for judging fnc similiarity • Gauging equation fit • Sum of squared errors

Lagramge, top algorithm void lagramge (Grammar g, int maxComplexity, int maxBeamWidth, fnc fittingFnc(), fnc eqnPreferFnc(), fnc stoppingCriteriaMet()‏ )‏ { T0 = init symbol Q1 = {T0} do { Q = Q1 R = Q.generateRefinements(); foreach r in R { r.calcBestFitConstants(fittingFnc); r.symbolicCloseness = eqnPreferFnc.calc(r); } Q1 = union(Q,R); Q1.keepBest(maxBeamWidth,symbolicCloseness); } while ( (Q != Q1) && !stoppingCriteriaMet() ); }

Heuristic functions: fnc fittingFnc()‏ Job = given this equation parse tree, find best fitting parameters attrlearn = c1*attr1 + c2*attr2 + c3* attr1*attr2 + c4 Distinguish between: • space of equation parse trees (countable)‏ • space of all the floating point parameters (in principle, uncountable)‏ fnc eqnPreferFnc()‏ Job = given this instantiated equation, how well does it fit the data? Traditional least squares, or some variant • Noise tolerant

Lagramge Handling Issues (2)‏ 3. Incorporation of domain knowledge Use a grammar! Inputs: • Context free grammar (More on next slide)‏ • Input data D = (V,vd,M)‏ V = set of variables vd = variable to predict in V M = measurements of all vars: time v1 v2 . . . vn t0 v1,0 v2,0 vn,0 t1 v1,1 v2,1 vn,1 . . . Outputs: Ordinary or differential equation

Grammar Context Free Grammar = <nonTerminalSet, terminalSet, productionSet, startSym> Ex: double monod(double c, double v)‏ { return(v/(v+c)); } N = {E, F, M, v} T = {+, const, *, monod, (“,”), N, P, Z} P = { E -> const | const*F | E + const*F F -> v | M | v*M M -> monod(const,v) } S = E

Tree's “Height” as its complexity The deeper the tree, the more complex the expression y = f1(x), y = f1(f2(x)), y = f1(f2(f3(x)))‏ Sym Height = shallowest height of deepest prod. p = A -> A1,A2, . . .Al h(p) = 1 + maxi {h(Ai)} h(A) = if (isNonTerminal(A)) minq in prods of A{h(q)} else /* isTerminal(A) */ 0 height, production 1 E->const; v->N; v->P; V->Z 2 M->monod(const,v); F->v 3 F->M; F->v*M; E->const*F; E->E+const*F

Lagramge Refinement void refinement (Tree t, int maxHeight)‏ { choose A, nonterminal in T pA,i = production applied at that node l = pathLength from root T to A delete subtrees of A if ( (i+1) <= A.numProductionsFor()) && (l + height(pA,i+1) <= maxHeight) ) { pA,i+1 = A->A1,A2, . . . Al replace pA,i with pA,i+1 expand A with successors A1,A2, . . . Al do { choose nonTerm leaf B if T pB,1 = B->B1,B2, . . . Bm expand B with successors B1,B2, . . . Bm } while ( T.hasNonTerminals() ); }

Refinement Example

Lagramge Findings (1): Aquatic Ecosystems: • N = concentration of nutrient • P = concentration of phytoplankton • Z = concentration of zooplankton dN/dt = -NP/(kN + N)‏ dP/dt = NP/(kN + N) – rPP – PZ(kP+P)‏ dZ/dt = PZ(kP+P) - rZZ

Lagramge Findings (2): Two poles on a cart (inverted pendulum):

Lagramge Discussion Problems it solves • Handling noisy data • sum of squared error • Handling of time and derivatives • Derivatives are just another variable • Nothing special to do on user's part • Distinguishes between error in symbol space and error in numeric value space • Heuristic fnc for 1st, grading fnc for 2nd • Incorporation of domain knowledge • Symbol space heuristic fnc • Grammar

Lagramge Discussion (2)‏ Hold up one second! Do you believe this? Incorporation of domain knowledge • Grammar Grammer ?!? • Computer scientists think in terms of grammars • FSM/regular expression • PDA/context free • Turing Machines/context sensitive • Do scientists think in terms of grammars? (Besides linguists, of course)‏

Lagramge Discussion (3)‏ Remember BACON 5 heuristic: • If have structural knowledge that suggests a symmetric equation than look for one first • Can we do that with Lagramge? Grammar for symmetry(?)‏ A -> BCB (not quite, both B's are no distinct)‏ A -> c*B + c*D (symmetric but not that powerful)‏ A -> (B) (ditto)‏ Preference for symmetry • Jury-rig eqnPreferFnc() as needed • Are you happy with that?

Inductive Process Modeling

IPM We want it all! • Processes • Explicitly recognized objects (in software engineering sense)‏ • Distinguish between instances and classes • higher level constructs with uninstantiated equations • Simulation equations constructed from process ones • Automatically deals with time • Exhaustive search • To some maximal complexity • Better fitting function for time series data

IPM Generic Processes Library pred_prey generic process logistic_growth; • variables S[species]; • parameters gr[0,3], ic[0,0,1]; • equations d[S,t,1] = gr * S * (1-ic*S)‏ generic process exponential_growth; variables S{species}; parameters gr[0,3]; equations d[S,t,1] = gr * S; generic process exponential_decay; variables S{species}; parameters dr[0,2]; equations d[S,t,1] = -1*dr * S; generic process holling_1; variables S1{prey}, S2{predator}; parameters ar[0.01,10], ef[0.001,0.8]; equations d[S1,t,1] = -1 * ar * S1 * S2; d[S2,t,1] = ef * ar * S1 * S2

IPM Quantitative Processes: Fills in constants from equation templates: Model PredatorPrey: vars: aurelia{prey}, nasutum{predator}; observable: aurelia, nasutum; process aurelia_growth; equations d[aurelia,t,1] = 1.81 * aurelia * (1-0.0003*aurelia); process nasutum_decay equations d[nasutum,t,1] = -1 * 1.04 * nasutum; process predation_holling_t: equations d[aurelia,t,1] = -1 * 0.03 * aurelia * nasutum; d[nasutum,t,1] = 0.30 * 0.03 * aurelia * nasutum; d[aurelia]/dt = 1.81 * aurelia * (1-0.0003*aurelia) – 0.03 * aurelia * nasutum d[nasutum]/dt = -1.04 * nasutum + 0.03 * aurelia * nasutum

IPM Algorithm 1. Find all permissible instantiations of generic processes with specified variables: logistic_growth: S -> aurelia logistic_growth: S -> nasutum exponential_decay: S -> aurelia exponential_decay: S -> nasutum holling_1: S1 -> aurelia, S2 -> nasutum

IPM Algorithm (2)‏ 2. Go from partially instantiated processes to generic models by carrying out exhaustive search of model structures, up to some complexity limit process logistic_growth; parameters gr[0,3], ic[0,0.1]; equations d[aurelia,t,1] = gr * aurelia * (1-ic*aurelia)‏ process exponential_decay; parameters dr[0,2]; equations d[nasutum,t,1] = -1 * dr * nasutum process holling_1: parameters ar[0.01, 10], ef[0.001, 0.8] equations d[aurelia,t,1] = -1 * ar * aurelia * nasutum d[nasutum,t,1] = ef * ar * aurelia * nasutum

IPM Algorithm (3)‏ 3. Find best fitting parameters w/Least squares fitting that does 2nd order gradient descent thru parameter space • Full simulation • Applies standard tricks for avoiding local mins /or/ when all variables observable • Does teacher forcing • No full simulation • Given attributes at time t, minimize error at t+1

IPM Discussion Finds sophisticated predator/prey relationships in synthetic and actual data • Multiple predators and prey species Successes • Applies domain knowledge in scientist-friendly fashion • Groups equations and equation fragments together in clumps that a scientist would find intuitive Improvement? • It's an algorithm, not an architecture • Algorithm could be incorporated in a larger arch.

CSC 599: Computational Scientific Discovery