460 likes | 713 Views
Life with four billion atoms. Tom Knight Ginkgo Bioworks. Energy in the 1800s. The steam engine has given more to science than science has given to the steam engine. --- Lord Kelvin. Information in the 1900s. Catabolism. Anabolism. Natural Complexity (Food).
E N D
Life with four billion atoms Tom Knight Ginkgo Bioworks
Energy in the 1800s The steam engine has given more to science than science has given to the steam engine. --- Lord Kelvin
Information in the 1900s
Catabolism Anabolism Natural Complexity (Food) Specified Complexity (Organisms) Design Information (Genome) Core of simple universal parts (central metabolites) (Energy carriers)
Some history… • Confusion over what PPLO/Mycoplasma were • “The Microbe of pleuorpneumonia” Nocard 1896 • 1932 isolation of “PPLO” Koch postulates. • 1958 Klieneberger-Nobel: free living bacterial species • Morowitz 1962 SciAm: “the smallest living cell” • 1980 Gilbert effort to sequence M. capricolum • 1982 Morowitz “complete understanding of life” • 1996 Fraser et al. M. genitalium sequence • 1999 Hutchison et al. Minimal genome set for M. genitalium • 2009 Gibson & Lartigue: Genome transplantation • 2012 Serrano et al. Comprehensive model
Complexity, minimality &simplicity • Complex systems have many parts • Reducing the part count leads generally to simpler system (minimal part count) • Extreme part count reduction leads to shared parts • These systems are ironically less simple • There is an optimal part count for modular design • Conservation of complexity • Stratified design • Structure • function
Complexity Reduction User Application software Operating system, user interface Programming language Instruction set architecture Virtual machine Computer hardware design Functional computing units Logic synthesis Logic gates Circuit design Transistors Mask geometry Fabrication technologies Semiconductor physics Quantum physics 100’s of OS calls 100 statements 100’s of instructions 10’s of units 10’s of gate types 4 types of transistors 15 mask layers 6 materials
Complexity Reduction • Good News: Biology is modular and abstract • Evolution needs modular design as much as we do • We can discover the modular designs, modify them, and use them
Engineered Simple Organisms • modular • understood • malleable • low complexity • Start with a simple existing organism • Remove structure until failure • Rationalize the infrastructure • Learn new biology along the way The chassis and power supply for our computing
Relative Complexity Mycoplasma genitalium (580 kB) Mesoplasma florum (793 kB) S. cerevisiae (12 MB) Lilly (12 GB) T7 Phage (36 kB) Human (3.3 GB) E. coli (4.6 MB) Plasmid Gene 100 1K 10K 100K 10M 100M 1G 10G 100G 1M Alive Autotroph Log Genome Size, base pairs
Choosing an organism:Mesoplasma florum • Isolation from the flower of a lemon tree, Florida (McCoy84) • Safe BSL-1 organism -- an insect commensal • Not a human or plant or animal pathogen • No growth at 37C • Fast growing • 40 minute vs. • six hours for doubling in M. genitalium • Convenient to work with • Facultative anaerobe • Small genome: • 793,244 bp • 682 coding regions
Tomographic EM of Me. florum Grant Jensen – Caltech 3-D TEM image of Mesoplasma florum Reconstructed from angled TEM images 300 x 400 nm 6 nm membranes 5 nm ribosomes False colored DNA
How many atoms? • Cell diameter is about 400 nm • Approximately 2000 atoms in diameter • About four billion atoms • 70% of these are in water molecules • 1.2 billion atoms in biomolecules • DNA is about 40 million atoms, 3%
Genome characteristics • 793281 base pairs • 26.52% G + C • 682 protein coding regions • UGA for tryptophan • No CGG codon or corresponding tRNA • Classic circular genome • oriC, terminator region, gene orientation • 39 stable RNAs • 29 tRNAs • 2x 16S, 23S, 5S • RNAse-P, tmRNA, SRP • One inactive insertion sequence • Gene direction largely oriented with replication fork
Understand the metabolism • Identify major metabolic pathways by finding critical genes coding for known enzymes • Predict necessary enzymes which may not have been found • Evaluate the list of unknown function genes for candidates • Build the major metabolic pathway map of the organism • Consider elimination of entire pathways
G. Fournier 02/23/04 PTS II System Mfl519, Mfl565 sucrose trehalose xylose beta-glucoside glucose unknown ribose ABC transporter Mfl516, Mfl527, Mfl187 Mfl500 Mfl669 Mfl009, Mfl033, Mfl318, Mfl312 fructose Mfl214, Mfl187 Mfl619, Mfl431, Mfl426 ATP Synthase Complex Mfl181 Mfl497 Mfl515, Mfl526 Mfl499 Mfl317?, Mfl313? Mfl009, Mfl011, Mfl012, Mfl425, Mfl615, Mfl034, Mfl617, Mfl430, Mfl313? ? Mfl109, Mfl110, Mfl111, Mfl112, Mfl113, Mfl114, Mfl115, Mfl116 Mfl666, Mfl667, Mfl668 glucose-6-phosphate chitin degradation Mfl347, Mfl558 ATP ADP sn-glycerol-3-phosphate ABC transporter Pentose-Phosphate Pathway Glycolysis Mfl023, Mfl024, Mfl025, Mfl026 L-lactate, acetate Mfl223, Mfl640, Mfl642, Mfl105, Mfl349 glyceraldehyde-3-phosphate Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259 Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281 Lipid Synthesis unknown substrate transporters Mfl384, Mfl593, Mfl046, Mfl052 fatty acid/lipid transporter ribose-5-phosphate acetyl-CoA Mfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626 Mfl590, Mfl591 Mfl099, Mfl474,Mfl315, Mfl325,Mfl482 x13+ PRPP cardiolipin/ phospholipids membrane synthesis Purine/Pyrimidine Salvage phospholipid membrane Identified Metabolic Pathways in Mesoplasma florum Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl170, Mfl195, Mfl372 Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648, Mfl143, Mfl466, Mfl198, Mfl556, Mfl385 Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375 niacin? Mfl063, Mfl065, Mfl038, Mfl388 xanthine/uracil permease Pyridine Nucleotide Cycling variable surface lipoproteins Mfl413, Mfl658 Mfl444, Mfl446, Mfl451 Mfl340, Mfl373, Mfl521, Mfl588 Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582, Mfl055, Mfl328 Mfl150, Mfl598, Mfl597, Mfl270, Mfl649 hypothetical lipoproteins DNA Polymerase RNA Polymerase x22 competence/ DNA transport Mfl047, Mfl048, Mfl475 Electron Carrier Pathways DNA RNA K+, Na+ transporter Mfl027, Mfl369 Flavin Synthesis Mfl064, Mfl178 Nfl289, Mfl037, Mfl653, Mfl193 NAD+ Mfl165, Mfl166 ribosomal RNA transfer RNA degradation FMN, FAD Mfl193 Mfl563, Mfl548, Mfl088, Mfl258, Mfl329, Mfl374, Mfl541, Mfl005, Mfl647, Mfl231, Mfl209 Mfl029, Mfl412, Mfl540, Mfl014, Mfl196,Mfl156, Mfl282, Mfl387, Mfl682, Mfl673, Mfl077, rnpRNA Mfl283, Mfl334 malate transporter? hypothetical transmembrane proteins NADP Mfl378 x57 Ribosome metal ion transporter Signal Recognition Particle (SRP) riboflavin? Mfl356, Mfl496, Mfl217 messenger RNA tRNA aminoacylation NADPH NADH protein secretion (ftsY) srpRNA, Mfl479 23sRNA, 16sRNA, 5sRNA, Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082, Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080, Mfl623, Mfl137, Mfl492, Mfl406 Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227, Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440? cobalt ABC transporter Mfl237 Mfl152, Mfl153, Mfl154 proteins Formyl-THF Synthesis Export Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589, Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355 Mfl086, Mfl162, Mfl163, Mfl161 phosphonate ABC transporter met-tRNA formylation Mfl571, Mfl572 Mfl060, Mfl167, Mfl383, Mfl250 protein translocation complex (Sec) Mfl057, Mfl068, Mfl142,Mfl090, Mfl275 Mfl409, Mfl569 phosphate ABC transporter Mfl233, Mfl234, Mfl235 degradation THF? Mfl186 formate/nitrate transporter amino acids intraconversion? Mfl094, Mfl095, Mfl096, Mfl097, Mfl098 Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA Mfl509, Mfl510, Mfl511 spermidine/putrescine ABC transporter oligopeptide ABC transporter Mfl016, Mfl664 putrescine/ornithine APC transporter Mfl015 Mfl182, Mfl183, Mfl184 Mfl019 Mfl605 arginine/ornithine antiporter Mfl557 Mfl652 unknown amino acid ABC transporter glutamine ABC transporter lysine APC transporter alanine/Na+ symporter glutamate/Na+ symporter Amino Acid Transport
How Simple is this? • Missing cell wall, outer membrane • Missing TCA cycle • Missing amino acid synthesis • Missing fatty acid synthesis • One sigma factor • Small number of dna binding proteins • One insertion sequence, probably not active • One restriction system (Sau3AI-like) • CTG/CAG methylation (function?) • Evidence for shared protein function • MDH/LDH (Pollack 97 Crit rev microbiol 23:269)
Collaboration with Steve Tannenbaum / Yingwu Wang 2-D gels + MS spot ID LC/LC/MS/MS ID of trypsin digests Proteome
Proteome Results • 180 spots picked and analyzed • Mudpit LC/LC/MS/MS also carried out • 369 proteins identified by trypsin digestion and mass spec out of 682 annotated coding regions • Transcription of 16S ribosomal RNA • Stops don’t always stop • Instead they cause frame shifts into other frames
Transposome insertions • Engineered tetM tetracycline resistance gene • Promoter from Tn4001 tetM gene • Outward directed primers for insertion site verification • Unique I-SceI cut site • In vitro binding of Tn5 transposase • Electroporation of Tn5 transposome • Selection with tetracycline • Genomic DNA prep • Cut with MboI frequent cutter & religate • PCR with outward directed primers • Sequence to identify insertion site • Locate disrupted genes • Alternatively: directly sequence from genomic DNA
Tn5 Transposomes • Transposon design issues • Codon usage • Promoter design • Restriction site avoidance • Cell transformation • Electroporation voltage • Selection medium
Transposome insertion events • 2700 currently picked, saved, and sequenced • 337 Essential Genes + 29 tRNA + 7 essential RNA genes • Most are unsurprising surface lipoproteins and “unknown function” • Some surprises: inessential ftsZ, mreB, many ribosomal & tRNA modification proteins, • But the Sau3AI homologous restriction system appears essential • About 80 unknown function genes (many GTPases) are essential • Compare with • Dybvig08 results on M. arthritidis • French08 results on M. pulmonis • Glass06 results on M. genitalium • Ordered library of cells with 330 inactivated genes
Essential is not absolute • Multi-copy genes are not identified as essential • NADH oxidase • Acyl carrier protein • Essentiality is defined by the culture conditions • Genes with stability and reliability function are marked as dispensable • DNA repair • Chaparones • Some RNA modification enzymes • This is a much more important effect in larger genomes
Next in Analysis and Tools • Genome re-engineering with knock-in/knock-out • Resequencing • Whole cell metabolic models • Plug and play modules for additional function • Biosafety issues
Genome re-engineering tools • Plasmid: S. citri pSci2 PE protein based (Breton08) • J70302 registry part, under test now • recET recombination system (S. citri recT gene) • J70007 recT part, DNA available, being mutated • Chloramphenicol resistance gene cassette • PheS mutant gene cassette • Phase 1: • Turn on recombination • Insert PheS/cat cassette in the target location • Select with Chloramphenicol • Phase 2: • Turn on recombination • Insert final modification • Select with p-chlorophenylalanine • Result: seamless editing of the chromosome
Resequencing • Illumina sequencing is cheap and very high throughput • Relatively straightforward with a pre-existing scaffold sequence • We get millions of reads of limited length (35-70 bp) • Paired ends, 250-500 bp fragments • Bar coded samples can multiplex the sequencing effort • Allows many samples to be sequenced in a single run • Resequence the Mesoplasma florum genome • De novo sequence for sixteen additional strains • Collection of Robert Whitcomb • De novo sequencing for several closely related species • Mesoplasma entomophilum • Mesoplasma lactucae
Whole cell modeling • Approximately 2000 chemical reactions • About 300 small molecule species • Faster implementations of stochastic models • Faster computers • Comparison against reality • Mass spec quantitation of metabolites
Energy sources Arginine vs. glucose Photosynthesis pathway Citric acid cycle (reverse?) Amino acid synthesis Add unnatural AAs Nucleotide synthesis Lipid synthesis Cofactor synthesis Measurement structures Environmental niche Halobacterium Sulfur reducer Temperature optimum Membrane export / import Membrane structure Sensing of chemical environment Flagellar motion Light sensing Light production Cell cycle control (sporulation) Biosafety modules Open Cell Modules
Biosafety Barriers • Codon isolation • CGG containing genes are unusable inside • TGA containing genes are unusable outside • Extend this idea with more codons • Pairs of required essential nutrients • Reduces likelihood of gradual evolution of workarounds • Explicit “kill” switches • Otherwise benign chemicals lethal to this organism • Shared function with critical metabolism reduces drift
Recoding the genome of entire organisms Engineered phage Natural Phage X X X Engineered organism No transfer: UGA not translated No transfer: CGG not translated X Natural organism
Kit Part the genome • Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element • Develop techniques for recombining parts into coherent modules • YAC editing and assembly, e.g. • Lambda RED or RecET recombination • Enable the bootstrapping of cells based on the redesigned genome • Liposome fusion, e.g. • Learn the design rules for chromosomes
Harold Morowitz Greg Fournier Gail Gasparich Bob Whitcomb Eric Lander Bruce Birren Nicole Stange-Thomann George Church Roger Brent Grant Jensen Yingwu Wang Samantha Burke PJ Steiner Nick Papadakis Ron Weiss Drew Endy Randy Rettberg Austin Che Reshma Shetty MIT Synthetic biology working group DARPA, NTT, NSF, Microsoft Colleagues at Ginkgo Bioworks Thanks to…
Thank you for your attention
Our Plan • Completely understand a simple organism • Build excellent models and predictive tools • Simplify the organism further • Remove inessential genes • Replace dual function genes with single function equivalents • Abstract useful modules from other living systems • Understand and create good models for these modules • Selectively add these modules to the existing simple cell The code’s 4 billion years old; it’s time for a rewrite
The Mollicute Bibliome • Complete collection of mycoplasma related papers: • 6,411 and counting • Books and book chapters also • Endnote file: mycoplasmas.enl • Downloaded .pdfs for articles > 1995 • Scanned articles and books, OCR • Collaboration for “shallow semantic” understanding • people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso