300 likes | 390 Views
Lecture 8: Lagramge (sp) and Inductive Process Modeling. CSC 599: Computational Scientific Discovery. Outline. Review Processes BACON Other equation finders The act of equation finding Lagramge (sp) Inductive Process Modeling. Review: Processes. Deal with changes over time Range
E N D
Lecture 8: Lagramge (sp) and Inductive Process Modeling CSC 599: Computational Scientific Discovery
Outline Review • Processes • BACON • Other equation finders • The act of equation finding Lagramge (sp) Inductive Process Modeling
Review: Processes Deal with changes over time • Range (start, finish), (start, duration) • Rates of change dx/dt • Previous history attribute[event[t]] = f(event[1], event[2], . . . event[t-1]) Attributes of processes (when time is included) • Quantity of “always on” forces as function of time • Gravity, electro-magnetism • Maximum limit of “homeostatic” forces • Friction, Normal force • Misc. changes during process
Review: BACON BACON was driven by the data and domain knowledge: • Distinguishing between independent and dependent attribute • BACON 3 • Discovery of intrinsic properties of conceptual values • BACON 4 • Preference for symmetric equations given knowledge that suggests one could exist • BACON 5
Equation Finders, 1990s [From Ljupco Todorovski's Homepage:] • http://www-ai.ijs.si/~ljupco/ • COPER [Kokar 1986] • Uses information about the dimension units of the system variables to restrict the space of possible equation structures. • BACON [Langley 1987] • Pioneer among equation discovery systems. It uses a set of data-driven heuristics for finding regularities (constancies and trends) in data and for formulating hypotheses based on them.
Equation Finders, 1990s (2) • Fahrenheit/EF[Langley and Zytkow 1989], [Zembowitz and Zytkow 1992] • Used as a equation discovery subsystem of the scientific discovery system. For discovering bivariate equations only, user being able to specify the set of operators and functions to be used within equations. • ABACUS [Falkenhainer and Michalski 1990] • Experiments with different search strategies through the space of equation structures. Also allows discovery of piecewise equations using clustering for identifying the limits between pieces. • IDS [Nordhausen and Langley 1990] • ARC [Moulet 1992]
Equation Finders, 1990s (3) • E* [Schaffer 1993] • Discovers of bivariate equations using a small set of predefined equation structures. • LAGRANGE [Dzeroski and Todorovski 1995] • Handled differential equations. • GOLDHORN [Krizman et al. 1995] • Extended LAGRANGE towards discovery from noisy data. • SDS [Washio and Motoda 1997] • Used information about scale types of the dimension units of the system variables to restrict the space of possible equations. • LAGRAMGE [Todorovski and Dzeroski 1997] • Allowed the user to specify the space of possible equations with context free grammar.
Issues with equation finders • How to incorporate time Esp. derivatives • How separate • Heuristics used to search eqn space • Grading fnc used to determine how well eqn fits: • Compare: • True function = x2 • Current function = -x2 • Large “error” in numeric space • Small “error” in symbolic (eqn space) • How best to incorporate domain knowledge? • BACON programs did so, but was ad hoc
Lagrange: The Person • Italian/French • Mathematician and physicist • 1736-1813 The System • Mid-1990s equation finder • Found differential equations • Dzeroski and Todorovski 1995
Lagramge (sp) Lagramge = Lagrange (system) + grammar Handling the issues: • How to account for time? • Each vector of variable values can represent the state of the system at a particular time • Throw both attr and d[attr]/dt in variable set • Separating • Heuristics for searching equation space Explicit user-defined function for judging fnc similiarity • Gauging equation fit • Sum of squared errors
Lagramge, top algorithm void lagramge (Grammar g, int maxComplexity, int maxBeamWidth, fnc fittingFnc(), fnc eqnPreferFnc(), fnc stoppingCriteriaMet() ) { T0 = init symbol Q1 = {T0} do { Q = Q1 R = Q.generateRefinements(); foreach r in R { r.calcBestFitConstants(fittingFnc); r.symbolicCloseness = eqnPreferFnc.calc(r); } Q1 = union(Q,R); Q1.keepBest(maxBeamWidth,symbolicCloseness); } while ( (Q != Q1) && !stoppingCriteriaMet() ); }
Heuristic functions: fnc fittingFnc() Job = given this equation parse tree, find best fitting parameters attrlearn = c1*attr1 + c2*attr2 + c3* attr1*attr2 + c4 Distinguish between: • space of equation parse trees (countable) • space of all the floating point parameters (in principle, uncountable) fnc eqnPreferFnc() Job = given this instantiated equation, how well does it fit the data? Traditional least squares, or some variant • Noise tolerant
Lagramge Handling Issues (2) 3. Incorporation of domain knowledge Use a grammar! Inputs: • Context free grammar (More on next slide) • Input data D = (V,vd,M) V = set of variables vd = variable to predict in V M = measurements of all vars: time v1 v2 . . . vn t0 v1,0 v2,0 vn,0 t1 v1,1 v2,1 vn,1 . . . Outputs: Ordinary or differential equation
Grammar Context Free Grammar = <nonTerminalSet, terminalSet, productionSet, startSym> Ex: double monod(double c, double v) { return(v/(v+c)); } N = {E, F, M, v} T = {+, const, *, monod, (“,”), N, P, Z} P = { E -> const | const*F | E + const*F F -> v | M | v*M M -> monod(const,v) } S = E
Tree's “Height” as its complexity The deeper the tree, the more complex the expression y = f1(x), y = f1(f2(x)), y = f1(f2(f3(x))) Sym Height = shallowest height of deepest prod. p = A -> A1,A2, . . .Al h(p) = 1 + maxi {h(Ai)} h(A) = if (isNonTerminal(A)) minq in prods of A{h(q)} else /* isTerminal(A) */ 0 height, production 1 E->const; v->N; v->P; V->Z 2 M->monod(const,v); F->v 3 F->M; F->v*M; E->const*F; E->E+const*F
Lagramge Refinement void refinement (Tree t, int maxHeight) { choose A, nonterminal in T pA,i = production applied at that node l = pathLength from root T to A delete subtrees of A if ( (i+1) <= A.numProductionsFor()) && (l + height(pA,i+1) <= maxHeight) ) { pA,i+1 = A->A1,A2, . . . Al replace pA,i with pA,i+1 expand A with successors A1,A2, . . . Al do { choose nonTerm leaf B if T pB,1 = B->B1,B2, . . . Bm expand B with successors B1,B2, . . . Bm } while ( T.hasNonTerminals() ); }
Lagramge Findings (1): Aquatic Ecosystems: • N = concentration of nutrient • P = concentration of phytoplankton • Z = concentration of zooplankton dN/dt = -NP/(kN + N) dP/dt = NP/(kN + N) – rPP – PZ(kP+P) dZ/dt = PZ(kP+P) - rZZ
Lagramge Findings (2): Two poles on a cart (inverted pendulum):
Lagramge Discussion Problems it solves • Handling noisy data • sum of squared error • Handling of time and derivatives • Derivatives are just another variable • Nothing special to do on user's part • Distinguishes between error in symbol space and error in numeric value space • Heuristic fnc for 1st, grading fnc for 2nd • Incorporation of domain knowledge • Symbol space heuristic fnc • Grammar
Lagramge Discussion (2) Hold up one second! Do you believe this? Incorporation of domain knowledge • Grammar Grammer ?!? • Computer scientists think in terms of grammars • FSM/regular expression • PDA/context free • Turing Machines/context sensitive • Do scientists think in terms of grammars? (Besides linguists, of course)
Lagramge Discussion (3) Remember BACON 5 heuristic: • If have structural knowledge that suggests a symmetric equation than look for one first • Can we do that with Lagramge? Grammar for symmetry(?) A -> BCB (not quite, both B's are no distinct) A -> c*B + c*D (symmetric but not that powerful) A -> (B) (ditto) Preference for symmetry • Jury-rig eqnPreferFnc() as needed • Are you happy with that?
IPM We want it all! • Processes • Explicitly recognized objects (in software engineering sense) • Distinguish between instances and classes • higher level constructs with uninstantiated equations • Simulation equations constructed from process ones • Automatically deals with time • Exhaustive search • To some maximal complexity • Better fitting function for time series data
IPM Generic Processes Library pred_prey generic process logistic_growth; • variables S[species]; • parameters gr[0,3], ic[0,0,1]; • equations d[S,t,1] = gr * S * (1-ic*S) generic process exponential_growth; variables S{species}; parameters gr[0,3]; equations d[S,t,1] = gr * S; generic process exponential_decay; variables S{species}; parameters dr[0,2]; equations d[S,t,1] = -1*dr * S; generic process holling_1; variables S1{prey}, S2{predator}; parameters ar[0.01,10], ef[0.001,0.8]; equations d[S1,t,1] = -1 * ar * S1 * S2; d[S2,t,1] = ef * ar * S1 * S2
IPM Quantitative Processes: Fills in constants from equation templates: Model PredatorPrey: vars: aurelia{prey}, nasutum{predator}; observable: aurelia, nasutum; process aurelia_growth; equations d[aurelia,t,1] = 1.81 * aurelia * (1-0.0003*aurelia); process nasutum_decay equations d[nasutum,t,1] = -1 * 1.04 * nasutum; process predation_holling_t: equations d[aurelia,t,1] = -1 * 0.03 * aurelia * nasutum; d[nasutum,t,1] = 0.30 * 0.03 * aurelia * nasutum; d[aurelia]/dt = 1.81 * aurelia * (1-0.0003*aurelia) – 0.03 * aurelia * nasutum d[nasutum]/dt = -1.04 * nasutum + 0.03 * aurelia * nasutum
IPM Algorithm 1. Find all permissible instantiations of generic processes with specified variables: logistic_growth: S -> aurelia logistic_growth: S -> nasutum exponential_decay: S -> aurelia exponential_decay: S -> nasutum holling_1: S1 -> aurelia, S2 -> nasutum
IPM Algorithm (2) 2. Go from partially instantiated processes to generic models by carrying out exhaustive search of model structures, up to some complexity limit process logistic_growth; parameters gr[0,3], ic[0,0.1]; equations d[aurelia,t,1] = gr * aurelia * (1-ic*aurelia) process exponential_decay; parameters dr[0,2]; equations d[nasutum,t,1] = -1 * dr * nasutum process holling_1: parameters ar[0.01, 10], ef[0.001, 0.8] equations d[aurelia,t,1] = -1 * ar * aurelia * nasutum d[nasutum,t,1] = ef * ar * aurelia * nasutum
IPM Algorithm (3) 3. Find best fitting parameters w/Least squares fitting that does 2nd order gradient descent thru parameter space • Full simulation • Applies standard tricks for avoiding local mins /or/ when all variables observable • Does teacher forcing • No full simulation • Given attributes at time t, minimize error at t+1
IPM Discussion Finds sophisticated predator/prey relationships in synthetic and actual data • Multiple predators and prey species Successes • Applies domain knowledge in scientist-friendly fashion • Groups equations and equation fragments together in clumps that a scientist would find intuitive Improvement? • It's an algorithm, not an architecture • Algorithm could be incorporated in a larger arch.