310 likes | 426 Views
Domain-specific Languages for Cellular Interactions. Bill Harrison Department of Computer Science University of Missouri at Columbia. This work partially supported by: NIH1 R0l GM62920-04A1, NIH1 P20 GM065762-01A1, the Georgia Research Alliance and the Georgia Cancer Coalition.
E N D
Domain-specific Languages for Cellular Interactions Bill Harrison Department of Computer Science University of Missouri at Columbia • This work partially supported by: • NIH1 R0l GM62920-04A1, • NIH1 P20 GM065762-01A1, • the Georgia Research Alliance and • the Georgia Cancer Coalition.
Domain-specific Languages for Cellular Interactions Bill Harrison Department of Computer Science University of Missouri at Columbia meow! • This work partially supported by: • NIH1 R0l GM62920-04A1, • NIH1 P20 GM065762-01A1, • the Georgia Research Alliance and • the Georgia Cancer Coalition.
Ph.D 2001, UIUC • Thesis: Modular Compilers and Their Correctness Proofs • Thesis Advisor: Sam Kamin • Post-doc, Oregon Graduate Inst. (OGI) • Three years on Programatica Project • using Haskell programming language as basis for formal methods • Assistant Professor, University of Missouri-Columbia since Fall 2003
Systems Biology asks… • Can static biological structure be related to dynamic biological behavior with mathematical clarity, precision, & rigor? • Can biological systems be viewed as the “sum of their parts”? • Can component-level models be integrated into precise system-level models of biological behavior? • What techniques from Mathematics and Computer Science apply to this composition problem?
Rhodobacter Sphaeroides • Photosynthetic bacterium • seeks out regions of greater light • Roughly the size of wavelength of light • cannot sense local light differences directly • applies random walk
Simulations of Biological Systems • Simulations provide qualitative feedback, but are not models per se • how accurate/faithful is a simulation? • what does the feedback mean? • can one reason about the biological phenomenon based on the simulation? • can you identify the biology by inspecting the text of the simulation program?
contains 1000 LOC to understand requires expertise in C++ …and biological model …and critical system details e.g., how is concurrency implemented? R. Sphaeroides in C++ bool global_state::register_state(void *apointer) { if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true; }
Program structure does not reflect biological model can you look at the source code and recognize the underlying biology? difficult to comprehend …and write correctly …and modify …and maintain …and re-use R. Sphaeroides in C++ bool global_state::register_state(void *apointer) { if( number_of_states == mother_of_all_states.size()) mother_of_all_states.resize(number_of_states + 1000); mother_of_all_states[number_of_states++] = apointer; return true; }
System Biology as Programming Language Design • The Problem: • General-purpose programming languages do not have the “right vocabulary” • Biological model: Concurrent Markov chains • C++: classes, pointers, etc. • …nor are they mathematics • Our Solution:Design small, special purpose languages with exactly the right vocabulary • called a Domain-specific Language (DSL) [Sheard99,Thiemann01,Leijen01] • Mathematical semantics of DSLs gives formal model of biology
Language Model of R. Sphaeroides cell1 || … || celln Executing: Produces animation:
Outline • Language Design and Domain-specific Languages • design, definition, and implementation • Systems Biology as Language Design • Case Study for Rhodobacter Sphaeroides • Design: what are the appropriate abstractions for R. Sphaeroides? • Definition: how do we specify exactly what R. Sphaeroides programs mean? • Implementation: how do we run R. Sphaeroides programs? • Conclusions
Cardinal Rule of Language Design Application Programmersshould choose languages with abstractions most suited to their task; Language designersmust provide languages with those abstractions… Domain Central Activities Reasonable Language System Programming “bit-fiddling” C Artificial Intelligence List processing LISP System Admin. Text processing, etc. PERL
DSLs are small languages w/ “domain abstractions” Ex: “Parsec” Parser DSL BNF for language <Stmt> <ident> := <Expr> translates directly assignStmt :: Parser Stmt assignStmt = do{ id ident ; symbol ":=" ; s Expr ; return (Assign id s)} Parsec code
“Why a language and not a library?” • The Slogan: “What is excluded from a DSL is as important as what is included in it” • libraries in a general-purpose language still require • considerable expertise & self-discipline on the part of the programmer • Lack of generality in DSL fewer things to “go wrong” • DSL may have desirable properties that a general-purpose language will not • e.g., implementation techniques specialized to DSL that do not apply to general-purpose languages • small size makes rigorous specification tractable
DSL Design DSL design for R. Sphaeroides • what are our domain abstractions? • How does this organism behave? • What modeling techniques are used by biologists to describe this behavior?
Bacterial Commands laze die adjust speed grow divide tumble *Probability of growth varies with light concentration
Chapman-Kolmogorov Equation* Pi,j probability of being in state m probability of transition from i to j *Commonly used framework for modeling biological systems [Bremaud99, Dailey02, Mao02, Shah00]
Chapman-Kolmogorov Equation A row in the above matrix encodes the transition function from state i of a Markov chain
Bacteria as Markov Chains State 0 State i … State m • non-deter. state machines with probabilistic transitions • induced by the Chapman-Kolmogorov equation • Pi,j in terms of environmental factors, organism • state, etc. • executing concurrently
Domain Abstractions for R. Sphaeroides • Individual cells: Markov-chain abstraction choose P1 Action1 … Pn Actionn • Actions: Tumble, Divide, AdjSpeed, Laze, Grow, etc. • Concurrency: cell1 || cell2 • Environmental Factors: light,size
Abstract syntax for CellSys • choose is our principal domain abstraction • behaves like the Markov chain transition function • Cell-level environment variables: light,size
DSL Definition • Background: Programming languages are “collections of effects” • Java = OO + Threads + State +… • LISP = Higher-order Functions + … • Prolog = Backtracking + … • Corresponding to each such effect is an algebraic construction called a monad • used for the development of modular semantic theories of programming languages [Moggi89] • monads may be constructed using “monad transformers”
Periodic Table of Effects StateT imperative := BackT backtracking cut ResT threads step pause StateT imperative := BackT backtracking cut ResT threads step pause EnvT binding @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose EnvT binding @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose DebugT debugging rollback ReactT reactivity send,recv,… ProbT probability random DebugT debugging rollback ReactT reactivity send,recv,… • Prog. languages are collections of effects captured as monads [Moggi] • Monads assembled from constructors (monad transformers) • Our view: Systems are collections of effects captured as monads • “Systems” broadly construed: • Compilers [Harrison00,98,01,02], • Secure system software [Harrison05,03], and • Biology [Harrison04]
Periodic Table of Effects StateT imperative := BackT backtracking cut ResT threads step pause EnvT binding @ v ErrorT exceptions raise/catch ContT continuations callcc NondetT non-determ. choose ProbT probability random DebugT debugging rollback ReactT reactivity send,recv,… • Mathematical definitions for any language created by combining MTs • CellSys = StateT + ResT + ProbT + ReactT • Such definitions are flexible • modular, extensible, and easily refactored
In a traditional RTS threads request services like “send a message” “output on device” “consume resource” RTS mediates ensuring that the threads do not interfere global system state remains consistent schedules threads DSL definition similar to traditional RTS … Run-time System threads
In CellSys Cells are threads with physical components as well size, velocity, … cells request services like “consume nutrients” “move me here” “want to divide” GE mediates like RTS, also: preserves physical integrity updates global world view performs scheduling High-level view of definition … Global Enviroment cells
DSL Implementation • Because CellSys defined in terms of monad transformers, may be implemented directly as Haskell program • I.e., monadic language definition may be transcribed “symbol for symbol” into Haskell • Haskell implementation easily instrumented to output system “snapshots”: • prints out snapshots in POV (Persistence of Vision) format & converted into MPEG
Q: What are appropriate languages for modeling? • Integrate techniques from programming languages • models of concurrency • language semantics • i.e., precise, mathematical language definitions • efficient language implementation • …into special purpose language called a “Domain-Specific Language” • abstractions taken directly from biology • comprehensible by biologists • DSLs and DSL programs • hide technical details irrelevant/uninteresting to biologists • are “tunable” by computer scientist to reflect discovery/refinement • execute to provide “reality check” by biologists
models of concurrency efficient implementation mathematical models of programs reasoning about programs organism structure & behavior modeling techniques cellular automata systems of PDE’s numerical techniques Bioinformatics = Computer Science + Biology Computer Science Biology = Hard Problem: How do you effect a technology transfer from CS Biology?
Interdisciplinary Process CellSys (version 1.0) Biologist evaluates DSL model for accuracy, expressiveness, etc. feedback/discussion Language expert refactors language as needed CellSys (version 2.0)
Summary Large body of work providing domain abstractions & models Comprehensibility, Reusability, & Ease of Use systems biology domain specific languages modular monadic semantics Precise description of biological phenomena through DSL semantics * Harrison & Harrison, “Domain Specific Languages for Cellular Interactions” in Proceedings of the International Conference IEEE Engineering in Medicine and Biology, 2004.