460 likes | 596 Views
Programming Languages for Biology. Bor-Yuh Evan Chang November 25, 2003 OSQ Group Meeting. Biological Perspective. F. FF. FF. FF. F [http://www.nocturnalvisions.freeservers.com/page6.html] FF [Matsudaira et al. Molecular Cell Biology 4.0 . Freeman, 2000]. Virus Expert.
E N D
Programming Languages for Biology Bor-Yuh Evan Chang November 25, 2003 OSQ Group Meeting
Biological Perspective F FF FF FF F [http://www.nocturnalvisions.freeservers.com/page6.html] FF [Matsudaira et al. Molecular Cell Biology 4.0. Freeman, 2000]
Virus Expert Cell Receptor Expert Traditional Biological Research • Experiments must focus on a small, specific piece of a system • isolate the variable • feasibility • Have led to an enormous wealth of (detailed) knowledge but in a fragmented form
Systems Biology • Emerging area of biology • study of the relationships and interactions between biological components • many thousand of molecules interact in complex series of reactions to perform some function (called a pathway) • e.g., lactose interacting with a receptor triggers a series of actions to create the enzyme capable of breaking it down into usable form • “pathways” may overlap
Approaching Systems Biology • Need a common language of describing/modeling all components of a system • must be modular, compositional, and provided varying levels of abstraction • Abstraction is an absolute necessity • 1 ribosome (eukaryotic) ¼ 82 proteins + rRNA • 1 protein ¼ hundreds/thousands amino acids • 1 membrane ¼ thousands of molecules (lipids, proteins, carbohydrates)
The Biologist’s View • How do biologists think about or view biological entities (e.g., proteins)? • an entity can interact with certain other types of entities • an entity can be in a certain “state” • interaction causes some action or state change • Analogous to a system of thousands of concurrent computational processes • Walter Fontana, a theoretical biologist, examined -calculus and linear logic for describing biological systems (¼1995).
Example “Textbook” Description http://vcell.ndsu.nodak.edu/~christjo/vcell/animationSite/lacOperon/
Our Role • Finding suitable abstractions for describing computation is our specialty! • Discovering/proving/checking properties of such descriptions (i.e., programs) is also our specialty! • Goal: • Find a mathematical abstraction convenient for describing, reasoning, simulating biological systems • DNA ! string over the alphabet {A,C,G,T} • enables the use of string comparison algorithms • Cellular Pathways ! ?
Outline • Why PL is at all related to Biology? • Previous Abstractions in Biology • Possible Directions of Work • PML • Conclusion
Previous Abstractions • Chemical kinetic models • can derive differential equations • well-studied, with considerable theoretical basis • variables do not directly correspond with biological entities • may become difficult to see how multiple equations relate to each other
Previous Abstractions • Pathway Databases (e.g., EcoCyc, KEGG) • store information in a symbolic form and provide ways to query the database • behavior of biological entities not directly described • Petri nets • directed bipartite multigraph (P,T,E) of places, transitions, and edges; places contain tokens • place = molecular species, token = molecule, transition = reaction 2
Previous Abstractions • Concurrent computational processes • each biological entity is a process that may carry some state and interacts with other processes • each process described by a “program” • prior proposals based on process algebras, such as the -calculus [Regev et al. ’01]
Possible Directions of Work • Biologically-motivated “process calculi” • finding a suitable machine model to serve as a common basis for describing biological systems • Cardelli, Danos, Laneve, … • High-level languages • find suitable high-level languages to make descriptions closer to informal ones • [Chang and Sridharan ’03] • Program analyses, simulation, and other tools • simulation will likely be insufficient • Creating models for obtaining results in biology
Outline • Why PL is at all related to Biology? • Previous Abstractions in Biology • Possible Directions of Work • PML • Conclusion
Modeling in the -calculus • The -calculus is concise and compact, yet powerful [Milner ’90] • take this as the underlying machine model • not looking for another machine model • However, it is far too low-level for direct modeling (ad-hoc structuring)
sites Informal Graphical Diagrams k-1 Protein Enzyme Protein Enzyme k kcat rules Protein Enzyme domains
Enzyme PML: Enzyme bind_substrate parameterized declared in outer scope interactions within the complex
Protein Protein PML: Protein bind_substrate bind_product
Larger Models • Modeled a general description of ER cotranslational-translocation • unclearly or incompletely specified aspects became apparent • e.g., can the signal sequence and translocon bind without SRP? Yes [Herskovits and Bibi ’00] • Extended to model targeting ER membrane with minor modifications
PML: Summary • Domains • set of mutually dependent binding sites • defines at the lowest-level the reactions a biological entity can undergo • Groups • static structure for controlling namespace • may represent a large biological entity • large complex, a system, etc. • [Compartments] • special groups that define boundaries • Semantics defined via a translation to the -calculus
PML: Summary • Benefits • easier to write and understand because of a more direct biological metaphor • block structure for controlling namespace and modularity • Future Work • naming? • proximity of molecules • integrating quantitative information (reaction rates, etc.) • type-checking PML specifications • exceptional / higher-level specifications • graphical and simulation tools
Conclusion • Systems biology needs a mathematical foundation • languages for describing concurrent computation seem like a step in the right direction • Status: all very preliminary • biologically-motivated process calculi • BioSPI, BioAmbients, Brane Calculus, … • high-level languages • PML • analyses and tools (emerging) • creating models for results in biology (emerging)
Conclusion • Abundance of new challenges for PL • language design: biologically-motivated operators • analysis and simulation: dealing with the scale • … • How much biology does one need to learn to begin?
Compartments • Critical part of biological pathways • prevents interactions that would otherwise occur • Description of the behavior of a molecule should not depend on the compartment • Regev et al. use “private” channels in the -calculus for both complexing and compartmentalization
MolA PML: Simple Compartments Example MolB bind_a bind_a
CytERBridge PML: Simple Compartments Example ER Cytosol MolB MolA
MolA PML: Simple Compartments Example ER Cytosol CytERBridge MolB
Semantics of PML • Defined in terms of the -calculus via two translations • from PML to CorePML • “flattens” compartments, removes bridges
Semantics of PML • from CorePML to the -calculus
Example: Cotranslational Translocation • Ribosome translates mRNA exposing a signal sequence • Signal sequence attracts SRP stopping translation • SRP receptor (on ER membrane) attracts SRP • Signal sequence interacts with translocon, SRP disassociates resuming translation • Signal peptidase cleaves the signal sequence in the ER lumen, Hsc70 chaperones aid in protein folding