1 / 42

Biosimplicity: Engineering Simple Life

Biosimplicity: Engineering Simple Life. Tom Knight MIT Computer Science and Artificial Intelligence Laboratory. Simplicity is the ultimate sophistication -- Leonardo da Vinci. Measuring Complexity. Number of components Size of description Size of functional model.

fola
Download Presentation

Biosimplicity: Engineering Simple Life

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biosimplicity: Engineering Simple Life Tom Knight MIT Computer Science and Artificial Intelligence Laboratory Simplicity is the ultimate sophistication -- Leonardo da Vinci

  2. Measuring Complexity • Number of components • Size of description • Size of functional model Have you ever thought... about whatever man builds, that all of man's industrial efforts, all his calculations and computations, all the nights spent over working draughts and blueprints, invariably culminate in the production of a thing whose sole and guiding principle is the ultimate principle of simplicity? In any thing at all, perfection is finally attained, not when there is no longer anything to add, but when there is no longer anything to take away. -- Antoine de Sainte Exupery, in "Wind, Sand, and Stars"

  3. Engineered Simple Organisms • modular • understood • malleable • low complexity • Start with a simple existing organism • Remove structure until failure • Rationalize the infrastructure • Learn new biology along the way The chassis and power supply for our computing

  4. Some history… • Confusion over what PPLO/Mycoplasma were • “The Microbe of pleuorpneumonia” Nocard 1896 • 1932 isolation of “PPLO.” Koch postulates. • 1958 Klieneberger-Nobel identifies them as free living bacterial species • Morowitz 1962 SciAm: “the smallest living cell” • 1980 Gilbert effort to sequence M. capricolum • 1982 Morowitz “complete understanding of life” • 1996 Fraser et al. M. genitalium sequence • 1999 Hutchison et al. Minimal genome set for M. genitalium

  5. Relative Complexity Mycoplasma genitalium (580 kB) Mesoplasma florum (793 kB) S. Cerevisiae (12 MB) T7 Phage (36 kB) Human (3.3 GB) E. Coli (4.6 MB) Lilly (12 GB) Plasmid Gene 100 1K 10K 100K 10M 100M 1G 10G 100G 1M Alive Autotroph Log Genome Size, base pairs

  6. From Giovannoni et al. 2005

  7. Choosing an organism • Safe • BSL-1 organism -- insect commensal • Un-regulated • Not a crop plant or domesticated animal pathogen • Fast growing • 40 minute doubling time • vs. six hours for M. genitalium • Convenient to work with • Facultative anaerobe • Small genome • Known sequence • Complete annotation

  8. The Mollicute Bibliome • Complete collection of mycoplasma related papers: • 6,418 and counting • All books and book chapters also • Endnote & Refworks • Downloaded .pdfs for articles > 1995 • Scanned articles and books, OCR with Abbyy Finereader • Plans for a Google appliance search engine • Collaboration for “shallow semantic” understanding • people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso

  9. Mesoplasma florum 1 u

  10. Tomographic TEM Courtesy Jensen Lab, Caltech

  11. Culture Medium 1161 • Beef Heart infusion • 4% Sucrose • Fresh yeast culture broth • 20% horse serum • Penicillin • Phenol red

  12. Defined medium Amino acids (minus asp, glu) Guanine Uracil Thymine Adenine Glucose BSA Palmitic acid Oleic acid Sodium phosphate KCl Magnesium sulfate Glycerol Spermine Nicotinic acid Thiamine Riboflavin Pyridoxamine Thioctic acid Coenzyme A

  13. Synthesis vs. Import • Mycoplasma import virtually all small biochemical molecules • Each import is done with a specific membrane protein – some are capable of importing a class • Complexity is reduced if the import is simpler than the synthesis • Example of the opposite: • Glutamine  Glutamic acid • Asparagine  Aspartic acid

  14. Sequencing the genome • First sequenced small portions of the genome to test that we had the correct species • Compared the results to Genbank entries • Sequenced PTS system gene, identical to reported sequence • Sequenced 16S rRNA (unreported) • Discovered identical to Mesoplasma entomophilum 16S rRNA sequence – probably the same species • Genbank entry • Measured genome size with pulse field gels • Sequenced 12% of genome to see what we were up against

  15. PFGE of Mesoplasmagenomic DNA y yeast marker l lambda marker me Mesoplasma entomophilum mf Mesoplasma florum ml Mesoplasma lactucae 1.2% agarose 9C 6V/cm Ramped 90 - 120 sec 48 hours

  16. I-CeuI digestion • Special restriction enzymes cut only at 23S rRNA sites 5´ ..TAACTATAACGGTC CTAA^GGTAGCGA..3´ 3´ ..ATTGATATTGCCAG^GATT CCATCGCT..5´ Calculate the number of rRNA sites in the genome from the number of cut fragments

  17. I-CeuI digests

  18. rRNA sequences • Degenerate primers • The two rRNA sequences are different:

  19. Library creation • Randomly cut genomic DNA with EcoRI • Shotgun cloned into pUC18 vector • Sequenced the inserts (0.1 – 8 Kb) • Sheared the genomic DNA with a needle • End repaired • Cloned into defective lambda phage vector • Packaged vector into phage heads • Infected E. coli cells with phage • ~ 40 Kb inserts

  20. What we learned from partial sequence • Almost all “old friends” • Little or no extra junk • Inter-gene sequences small (-4 to 30 bp) • Little transcriptional control • High AT vs. GC content – 27% average GC • 20% in typical genes, even less in control regions • 40% in rRNA regions

  21. Origin of Replication

  22. Whitehead Agreement • Whitehead agreed in January 2002 to sequence the organism • Estimated to take about two hours of time on their sequencers • “Sure, we can do it Tom, but what do we do with the rest of the day after the coffee break?” • “How many other organisms like this are there?” • 300 • “Why don’t we sequence them all?” • Good draft available November 2002 • Gaps closed July 2003 -- final November 2003

  23. Gap closure • 9 gaps remained • Long range PCR • Primer walking • One difficult sequence • Poly A region 16-17 bp long • Sequencing stuttered • Reprime with aaaaaaaaaaaaaaaag • Repeat region 186 bp in surface lipoprotein • Give up on accurate sequence, PCR for length • Final assembly verification

  24. Genome characteristics • 793281 base pairs • 26.52% G + C • 682 protein coding regions • UGA for tryptophan • No CGG codon or corresponding tRNA • Classic circular genome • oriC, terminator region, gene orientation • 39 stable RNAs • 29 tRNAs • 2x 16S, 23S, 5S • RNAse-P, tmRNA, SRP

  25. Standard Motifs • -10 present usually very highly conserved • Often preceded by a “TG” 1-2 bp upstream • Seldom a conserved -35 region • RBS is standard Shine-Dalgarno • Alternate RBS matches complementary region of 16S rRNA UAACAACAU (Loechel 91) • Standard stem-loop terminators with loop TTAA • 6-8 poly T tail in forward direction • dnaA box TTATCCACA • Four ribo-box sequences (Thi, Ile, Val, Guanine)

  26. Understand the metabolism • Identify major metabolic pathways by finding critical genes coding for known enzymes • Predict necessary enzymes which may not have been found • Evaluate the list of unknown function genes for candidates • Build the major metabolic pathway map of the organism • Consider elimination of entire pathways

  27. G. Fournier 02/23/04 PTS II System Mfl519, Mfl565 sucrose trehalose xylose beta-glucoside glucose unknown ribose ABC transporter Mfl516, Mfl527, Mfl187 Mfl500 Mfl669 Mfl009, Mfl033, Mfl318, Mfl312 fructose Mfl214, Mfl187 Mfl619, Mfl431, Mfl426 ATP Synthase Complex Mfl181 Mfl497 Mfl515, Mfl526 Mfl499 Mfl317?, Mfl313? Mfl009, Mfl011, Mfl012, Mfl425, Mfl615, Mfl034, Mfl617, Mfl430, Mfl313? ? Mfl109, Mfl110, Mfl111, Mfl112, Mfl113, Mfl114, Mfl115, Mfl116 Mfl666, Mfl667, Mfl668 glucose-6-phosphate chitin degradation Mfl347, Mfl558 ATP ADP sn-glycerol-3-phosphate ABC transporter Pentose-Phosphate Pathway Glycolysis Mfl023, Mfl024, Mfl025, Mfl026 L-lactate, acetate Mfl223, Mfl640, Mfl642, Mfl105, Mfl349 glyceraldehyde-3-phosphate Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259 Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281 Lipid Synthesis unknown substrate transporters Mfl384, Mfl593, Mfl046, Mfl052 fatty acid/lipid transporter ribose-5-phosphate acetyl-CoA Mfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626 Mfl590, Mfl591 Mfl099, Mfl474,Mfl315, Mfl325,Mfl482 x13+ PRPP cardiolipin/ phospholipids membrane synthesis Purine/Pyrimidine Salvage phospholipid membrane Identified Metabolic Pathways in Mesoplasma florum Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl170, Mfl195, Mfl372 Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648, Mfl143, Mfl466, Mfl198, Mfl556, Mfl385 Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375 niacin? Mfl063, Mfl065, Mfl038, Mfl388 xanthine/uracil permease Pyridine Nucleotide Cycling variable surface lipoproteins Mfl413, Mfl658 Mfl444, Mfl446, Mfl451 Mfl340, Mfl373, Mfl521, Mfl588 Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582, Mfl055, Mfl328 Mfl150, Mfl598, Mfl597, Mfl270, Mfl649 hypothetical lipoproteins DNA Polymerase RNA Polymerase x22 competence/ DNA transport Mfl047, Mfl048, Mfl475 Electron Carrier Pathways DNA RNA K+, Na+ transporter Mfl027, Mfl369 Flavin Synthesis Mfl064, Mfl178 Nfl289, Mfl037, Mfl653, Mfl193 NAD+ Mfl165, Mfl166 ribosomal RNA transfer RNA degradation FMN, FAD Mfl193 Mfl563, Mfl548, Mfl088, Mfl258, Mfl329, Mfl374, Mfl541, Mfl005, Mfl647, Mfl231, Mfl209 Mfl029, Mfl412, Mfl540, Mfl014, Mfl196,Mfl156, Mfl282, Mfl387, Mfl682, Mfl673, Mfl077, rnpRNA Mfl283, Mfl334 malate transporter? hypothetical transmembrane proteins NADP Mfl378 x57 Ribosome metal ion transporter Signal Recognition Particle (SRP) riboflavin? Mfl356, Mfl496, Mfl217 messenger RNA tRNA aminoacylation NADPH NADH protein secretion (ftsY) srpRNA, Mfl479 23sRNA, 16sRNA, 5sRNA, Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082, Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080, Mfl623, Mfl137, Mfl492, Mfl406 Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227, Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440? cobalt ABC transporter Mfl237 Mfl152, Mfl153, Mfl154 proteins Formyl-THF Synthesis Export Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589, Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355 Mfl086, Mfl162, Mfl163, Mfl161 phosphonate ABC transporter met-tRNA formylation Mfl571, Mfl572 Mfl060, Mfl167, Mfl383, Mfl250 protein translocation complex (Sec) Mfl057, Mfl068, Mfl142,Mfl090, Mfl275 Mfl409, Mfl569 phosphate ABC transporter Mfl233, Mfl234, Mfl235 degradation THF? Mfl186 formate/nitrate transporter amino acids intraconversion? Mfl094, Mfl095, Mfl096, Mfl097, Mfl098 Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA Mfl509, Mfl510, Mfl511 spermidine/putrescine ABC transporter oligopeptide ABC transporter Mfl016, Mfl664 putrescine/ornithine APC transporter Mfl015 Mfl182, Mfl183, Mfl184 Mfl019 Mfl605 arginine/ornithine antiporter Mfl557 Mfl652 unknown amino acid ABC transporter glutamine ABC transporter lysine APC transporter alanine/Na+ symporter glutamate/Na+ symporter Amino Acid Transport

  28. How Simple is this? • Missing cell wall, outer membrane • Missing TCA cycle • Missing amino acid synthesis • Missing fatty acid synthesis • One sigma factor • Small number of dna binding proteins • One insertion sequence, probably not active • One restriction system (Sau3AI-like) • CTG/CAG methylation (function?) • Evidence for shared protein function • MDH/LDH (Pollack 97 Crit rev microbiol 23:269)

  29. Minimal is not always simple • Shared function of parts • Overlapped genes • Tradeoff of import vs. synthesis • Example: • Television set design • Shared deflection coil, high voltage power supply, isolated filament supply • Three functions, a single circuit, a difficult engineering, modeling, debugging, and repair task • how many genes have multiple functions

  30. DNA Methylation • Bisulfite conversion of genomic DNA • Sequencing of converted DNA to identify methylated C positions • Results: GATC sequences, as expected • Unexpected: CAG and CTG sequences

  31. Current work • Array experiments • Close species • Transcriptional units, pseudo-genes • TRASH • Protein species by LC/LC/MS/MS • Elimination of the restriction system • Plasmid system • pBG7AU based • Recombination system • Positive/negative selection • Yeast chromosome transfer • Genome edits to reduce size • Genome edits to modularize • Genome edits to eliminate complexity • Use as a construction chassis

  32. Reconstruct the Genome • Use recombination techniques to edit the genome • Eliminate unnecessary genes • Remove overlaps • Standardize promoters, ribosomal binding sites • Identify transcriptional and translational regulators • Recode proteins to use a reduced portion of the coding space The code is 4 billion years old, it’s time for a rewrite

  33. Bring up YAC technology Spheroplasts, old YAC plasmid sequencing Triple transform with MF chromosome and pRML1 (Spencer 92) pRML2 Genome inactive except for ARS, telomeres, selection markers Use yeast recombination systems for genome editing Isolate and recircularize YAC to form a new genome Lipid encapsulate genome into vesicles Fuse vesicles with genome-killed wild type cells YAC mutagenesis

  34. Collaboration with Steve Tannenbaum / Yingwu Wang 2-D gels + MS spot ID LC/LC/MS/MS ID of trypsin digests Proteome

  35. Riboswitch Analysis • Collaboration with Ron Breaker / Adam Roth • Discovery of unique riboswitches specific for GTP rather than dGTP • Found in no other sequenced genomes • Analysis of close relatives under way

  36. Engineer plasmids • No known plasmids for this class of organism • Renaudin has made plasmids using the chromosomal OriC as the replicative element • Lartigue 03 • M. mycoides has pADB201 and pBG7AU rolling circle plasmids similar to pE194 (1082 bp) • We know the antibiotic sensitivities and have working resistance genes

  37. Kit Part the genome • Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element • Attempt to develop techniques for recombining parts into coherent modules • Develop techniques for assembling and modeling the resulting structures

  38. Harold Morowitz Greg Fournier Gail Gasparich Bob Whitcomb Eric Lander Bruce Birren Nicole Stange-Thomann George Church Roger Brent Grant Jensen Yingwu Wang Nick Papadakis Ron Weiss Drew Endy Randy Rettberg Austin Che Reshma Shetty MIT Synthetic biology working group DARPA, NTT, NSF, Microsoft Thanks to…

  39. Synthetic Biology • An alternative to understanding complexity is to remove it • This complements rather than replaces standard approaches • Engineering synthetic constructs will be easier • Enabling quicker more facile experiments • Enabling deeper understanding of the basic mechanisms • Enabling applications in nanotechnology, medicine and agriculture Simplicity is the ultimate sophistication -- Leonardo da Vinci

More Related