1.3k likes | 1.31k Views
Explore the fusion of molecular biology and computing using p-calculus to model biomolecular systems and processes. Understand the importance of good abstractions in consolidating knowledge effectively.
E N D
Molecule as Computation Ehud Shapiro Weizmann Institute of Science Joint work with Aviv Regev and Bill Silverman In collaboration with Corrado Priami, Naama Barkai and Luca Cardelli www.weizmann.ac.il/udi/ftp/POPL2003.ppt
The Story: Aviv Regev’s thesis • Briefly introduce molecular biology • Computer-based consolidation of molecular biology • Using, implementing and extending the p-calculus to describe biomolecular processes and systems
The Story Behind the Story: The puzzle of concurrent logic programming • 1983: Began design and development of Concurrent Prolog/Logix • 1993: Stopped working on concurrent logic programming • 2003: Could not find a better alternative. Is there?
Pentium II E. Coli • 1 million macromolecules • 1 million bytes of static genetic memory • 1 million amino-acids per second • 3 million transistors • 1/4 million bytes of memory • 80 million operations per second Comparison courtesy of Eric Winfree
Pentium II E. Coli 1 micron
Pentium II E. Coli 1 micron 1 micron
Inside E. Coli (1Mbyte)
Ribosomes in operation Ribosomes translate RNA to Proteins RNA Polymerase transcribes DNA to RNA
Seqeunces and String Transducers Ribosomes translate RNA to Proteins RNA Polymerase transcribes DNA to RNA
Molecular Biology in One Slide • Sequence: Sequence of DNA and Proteins
Molecular Biology in One Slide • Sequence: Sequence of DNA and Proteins • Structure: 3D Structure of Proteins and other biomolecules and molecular complexes
Molecular Biology in One Slide • Sequence: Sequence of DNA and Proteins • Structure: 3D Structure of Proteins and other biomolecules and biomolecular complexes • The Rest: Function, activity and interaction of molecular systems in cells
Computer-based consolidation of molecular biology will allow: • Handling the huge amount of accumulated knowledge • An objective knowledge repository • Sharing, comparing, criticizing and correcting accumulated knowledge • Converging to a consensus quickly and effectively
Computers are the means for consolidating sequence biology • Computers are key to sequence identification • Computer data bases store accumulated sequence information • Computer algorithms are used for sequence analysis
Computers are the means for consolidating sequence biology • Computers are used to is share, compare, criticize and correct sequence information • The result: Scientists converge to a consensus quickly and effectively
Computers are the means for consolidating structural biology • Computers are key to structure identification • Computer data bases store accumulated structure information • Computer algorithms are used for structure analysis
Computers are the means for consolidating structural biology • Computers are used to is share, compare, criticize and correct structure information • The result: Scientists converge to a consensus quickly and effectively
Computer-based consolidation of “The Rest” of molecular biology? • Tens of thousands of articles a year about the function, activity and interaction of molecular systems in cells • Knowledge is encapsulated in prose, pictures and diagrams • Where are the computers?
Computer-based consolidation of molecular biology The deep reason for the difference: The use of good abstractions for sequence and structure knowledge
What is an abstraction? • a mapping from a real-world domain to a mathematical domain (homomorphism) • highlights some essential proper-ties while ignoring other, complicating, ones.
A T P P 3’ 3’ C T G G A C G 5’ 5’ 5’ 5’ 5’ 5’ 5’ P P P P P P P 3’ 3’ 3’ 3’ 3’ 3’ 3’ Sequence as string abstraction T TCAGG C Mathematical domain G A a Real-world domain
Sequence biology uses the “DNA-as-string” abstraction • Relevant: Captures sequence information, ignoring many biochemical properties • Compute-able: Enables string algorithms, efficient data-bases • Understandable: A string over {A, T, C, G} is the universal format for genetic information • Extensible: E.g., the addition of a fifth symbol denoting methylated cytosine.
Structural biology uses the “Protein-as-3D-labeled graph” abstraction
What about “The Rest” of biology: the function, activityand interaction of molecular systems in cells? ?
B C B C C B C C B C B C A1 A2 A3 B1 B2 B1 B2 Essential properties of biomolecular systems Proteins A A A Domains C1 C2 Motifs A1 A3 C1 C2 A2
Essential properties of biomolecular systems A3 A1 C1 C2 A2 B2 B1 Binding A3 A1 A2 C1 C2 B2 B1 Modification A3 A1 A2 C1 C2 B2 B1 Dissociation A3 A1 C1 C2 A2 B2 B1 Binding C1 C2 A3 A1 A2 B2 B1
The “New Biology” • The cell as an information processing device • Cellular information processing and passing are carried out by networks of interacting molecules • Ultimate understanding of the cell requires an information processing model • Which?
“We have no real ‘algebra’ for describing regulatory circuits across different systems...” - T. F. Smith (TIG 14:291-293, 1998) “The data are accumulating and the computers are humming, what we are lacking are the words, the grammar and the syntax of a new language…” - D. Bray (TIBS 22:325-326, 1997)
Our Proposal:Molecule as Computational Process A system of interacting molecular entities is described and modelled by a system of interacting computational entities. “Cellular Abstractions: Cells as Computation”, Nature, September 26th, 2002, p.343
Composition of two processes is a process, therefore: • Molecular ensembles as processes • Molecular networks as processes • Cells as processes (virtual cell) • Multi-cellular organisms as processes • Collections of organisms as processes
Towards “Molecule as Process” • Use the p-calculus process algebra as molecule description language