460 likes | 585 Views
Optimal Combinatorial Biology & Genome Engineering. BU BME retreat 23-Jun-2004 9:45-10:30 Seacrest, N. Falmouth, MA. Thanks to: Broad Inst., DARPA-BioComp, DOE-GTL , EU-MolTools, NGHRI-CEGS , NHLBI-PGA, NIGMS-CECBSR, PhRMA, Lipper Foundation
E N D
Optimal Combinatorial Biology & Genome Engineering BU BME retreat 23-Jun-2004 9:45-10:30 Seacrest, N. Falmouth, MA Thanks to: Broad Inst.,DARPA-BioComp, DOE-GTL,EU-MolTools, NGHRI-CEGS, NHLBI-PGA, NIGMS-CECBSR,PhRMA, Lipper Foundation Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu
Exponential technologies ABI Shendure J, Mitra R, Varma C, Church GM (May 2004) Advanced Sequencing Technologies: Methods & Goals. Nature Reviews of Genetics 5, 335 -344.
0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010 Programming cells with DNA vs. Digital computers simulating cells Cells simulating digital computers Drugs & devices simulating human systems 01010101010 01010001101010 1010010110010110 01010001101010 010010111010 010101010 010101101010 10100100010110 010001101010 0100111010 0101010 0101101010 101000010110 0100001010 01001010 010101 01010 01010001101010 1010010110010110 01010001101010 010010 111010
Engineering complex systems (comparative genomics) Stedman et al. (2004) [Masticatory] Myosin gene mutation correlates with anatomical changes in the human lineageNature428, 415 - 418
Biosystems Engineering Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms
Now that we have 200 genomes, why sequence? • Once per organism • • Phylogenetic footprinting, biodiversity • RNA splicing & chromatin modification patterns. • Cell-lineage during development • NA "aptamers" & Ab for any protein • Once per person • Preventative medicine & genotype–phenotype associations • Frequently • • Cancer: mutation sets for individual clones, loss-of-heterozygosity • • B & T-cell receptor diversity: Temporal profiling,clinical • • New & old pathogen "weather map", biowarfare sensors • • DNA computing & lab selections Shendure et al. 2004 Nature Rev Gen 5, 335.
Why 'single molecule' sequencing? (1) Single-cell analyses , e.g. Preimplantation (PGD) (2) Co-occurrence on a molecule, complex, cell e.g. RNA splice-forms (3) Cost: $1K-100K "personal genomes" http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html (4) Precision: Counting 109 RNA tags (to reduce variance) (~5e5RNAs per human cell) Fixed 5e3 5e4 5e6 5e9 (goal) Costs EST SAGE MPSS Polony-FISSeq (polymerase colony)
Selector bead Polony Fluorescent In SituSequencing Libraries 1 to 100kb Genomic 2x20bp after MmeI M LR LR Sequencing primers PCR bead Greg Porreca Abraham Rosenbaum Dressman et al PNAS 2003 emulsion
Cleavable dNTP-Fluorophore (& terminators) Reduce or photo- cleave Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65
Polony-FISSeq: up to 2 billion beads/slide White= Fe-core pixels, Cy5 primer (570nm) ; Cy3 dNTP (666nm) Jay Shendure
Polony FISSeq Stats • # of bases sequenced (total) 23,703,953 • # bases sequenced (unique) 73 • Avg fold coverage 324,711 X • Pixels used per bead (analysis) ~3.6 • Read Length per primer 14-15 bp • Insertions 0.5% • Deletions 0.7% • Substitutions (raw) 4e-5 • Throughput: 360,000 bp/min • Current capillary sequencing 1400 bp/min • (600X speed/cost ratio, ~$5K/1X) • (This may omit: PCR , homopolymer, context errors) Shendure
CD44 Exon Combinatorics (Zhu & Shendure) • Alternatively Spliced Cell Adhesion Molecule • Specific variable exons are up-or-down-regulated in various cancers (>2000 papers) • v6 & v7 enable direct binding to chondroitin sulfate, heparin… Zhu,J, et al. Science. 301:836-8.
RNA exon examplesauto-regridded& quan-titated V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 Zhu,J, Shendure,J, Mitra, RD, Church, GM (2003) Science. 301:836-8. Single Molecule Profiling of Alternative Pre-mRNA Splicing.
CD44 RNA isoforms Eph4 = murine mammary epithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
Biosystems Engineering Integrating Measures & Models Environment Metabolites RNAi Insertions SNPs DNA Proteins RNA Replication rate interactions Escherichia Darwinian optima Prochlorococcus mutant suboptimality Homo
Integer Stochiometric matrix(Roche/ExPASy)Metabolic Pathways Cellular Processes
Flux ratios at each branch point yields optimal polymer composition for replication Vtransport Membrane Vsyn Vdeg Xi Vgrowth Xi=const. vj=0 Growth: c1Xi+ c2X2+... +cmXm Biomass
Biomass composition ATP Glu Gln Ala Gly Leu Thr UTP Met Arg CTP Cys Ci = coeff. in growth reaction Val Asn Asp GTP Ile Tyr His dACGT Phe Lys Pro Trp Ser AcCoA NADH FAD SucCoA CoA Xi = metabolites Optimize flow from input C,N,P to Biomass Edwards & Palsson, PNAS 2000, BMC Bioinf. 2000
Minimization of MetabolicAdjustment (MoMA)Linear Programming (LP) to find optima, Quadratic (QP) to find closest points Objective function = growth flux hyperplanes Mutant optimum Wild-type optimum Mutant initially (closest point) x,y are two of the 100s of flux dimensions MutantWild type (feasible flux polyhedra) Segre, Vitkup, & Church PNAS 99: 15112-7
Flux DataC009-limited 200 WT (LP) 180 7 8 160 140 9 120 10 Predicted Fluxes r=0.91 p=8e-8 100 11 14 13 12 3 1 80 60 40 16 20 2 6 5 15 4 17 18 0 0 50 100 150 200 Experimental Fluxes 250 250 Dpyk (LP) Dpyk (QP) 200 200 18 7 r=0.56 p=7e-3 8 r=-0.06 p=6e-1 150 150 7 8 2 Predicted Fluxes Predicted Fluxes 10 9 13 100 9 100 11 12 3 1 14 10 14 13 11 12 3 50 50 5 6 4 16 16 2 15 5 6 18 17 15 17 0 0 4 1 -50 -50 -50 0 50 100 150 200 250 -50 0 50 100 150 200 250 Experimental Fluxes Experimental Fluxes
Reproducibility of mass competition Correlation between two selection experiments Badarinarayana, et al. Nature Biotech.19: 1060
Competitive growth data On minimal media negative small selection effect C 2 p-values 4x10-3 1x10-5 LP QP Novel redundancies Position effects Hypothesis: next optima are achieved by regulation of activities.
Motif Co-occurrence, comparative genomics, RNA clusters, and/or ChIP2-location data P= 10-6 to 10-11 Genome Res. 14:201–208 Bulyk, McGuire,Masuda,Church
Synthetic testing of DNA motif combinations 1.3 2.4 (1.3 in DargR) 1.1 1.3 0.7 2.5 0.2 1.4 1.4 3.5 RNA Ratio (motif- to wild type) for each flanking gene Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208
Systems Biology Loop Model Experimental design (Systematic) Data Synthesis / Perturbation Proteasome targeting Genome Engineering
Engineering BioSystems Perturbations Action Specificity %KO "Design" Small molecules (drugs) Fast Varies Varies Hard Antibodies Fast Varies Varies Hard RNAi Slow Varies Medium OK Insertion "traps" Slow Yes Varies Random Proteasome targeting Fast Excellent Medium Easy Homologous recombination Slow Perfect Complete Easy
Programming proteasome targeting Janse, DM, Crosas,B Finley,D & Church, GM (2004) Localization to the Proteasome is Sufficient for Degradation.
Synthetic Genomes&Proteomes. Why? • Test or engineer cis-DNA/RNA-elements • Access to any protein (complex) including • post-transcriptional modifications • Affinity agents for the above. • Mass spectrometry standards, protein design • Utility of molecular biology DNA-RNA-Protein • in vitro "kits" (e.g. PCR, SP6, Roche) • Toward these goals design a chassis: • 115 kbp genome. 150 genes. • Nearly all 3D structures known. • Comprehensive functional data.
PURE translation utility(yet room for improvement) Removing tRNA-synthetases, RNases & proteases makes feasible: Optimal mRNA structure & codon usage Lee et al. 2004 J Immunol Methods. 284:147-57. Selection of scFvs specific for HBV DNA polymerase using ribosome display. Forster et al. 2003Programming peptidomimetic syntheses by translating genetic codes designed de novo. PNAS 100:6353-7. Klammt et al. 2004 Eur J Biochem. 271:568-80. High level cell-free expression & specific labeling of integral membrane proteins. Shimizu et al. 2001 Nat Biotechnol. 19:751-5. Cell-free translation reconstituted with purified components.
yU mS eU UUG UGG CAG | | | | | | | | | ... AUG AAC ACC GUU GAA 5' A 3' fM N T V E in vitro genetic codes 5' 3' Second base A U A C U C A C yU mS U G eU 80% average yield per unnatural coupling. bK = biotinyllysine , mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid Forster, et al. (2003) PNAS 100:6353-7
Mirror world :resistant to enzymes, parasites, predators L-amino acids & D-ribose (rNTPs, dNTPs) Transition: EF-Tu, peptidyl transferase, DNA-ligase D-amino acids & L-ribose (rNTPs, dNTPs) Dedkova, et al. (2003)Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125, 6616-7
Oligos for 150 & 776 synthetic genes(for E.coli minigenome & M.mobile whole genome respectively) Forster & Church
Up to 760K Oligos/Chip18 Mbp for $700 raw (6-18K genes) <1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/InvitrogenPhoto-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K NimblegenPhotolabile 5'protection Nuwaysir, Smith, Albert Tian, Gong, Church
Improve DNA Synthesis Cost Synthesis on chips in pools is 5000Xless expensive per oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!) Solution: Amplify the oligos then release them. 10 50 10 => ss-70-mer (chip) => ds-90-mer => ds-50-mer 20-mer PCR primers with restriction sites at the 50mer junctions Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
Improve DNA Synthesis Accuracyvia mismatch selection Tian & Church
Genome assembly 50 75 125 225 425 825 … 100*2^(n-1) Challenges: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding) 2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous recombination (Nick Reppas) Stemmer et al. 1995. Gene 164:49-53. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides.
All 30S-Ribosomal-protein DNAs&mRNAs synthesized in vitro M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Wild-type DNA Templates RNA Transcripts DNA Templates 0.5kb s19 0.3kb Nimblegen Xeotron/Atactic Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
Improving synthesis accuracy 9-fold Tian & Church
Extreme mRNA makeoverfor protein expression in vitro RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable. Solution: Iteratively resynthesize all mRNAs with less mRNA structure. Western blot based on His-tags Tian & Church
Enabling technologies • Multi-Gene Assembly • Protein, peptidomimetic synthesis • CAD/CAM & Design for manufacturing • Automated homologous recombination • for E.coli & embryonic stem cells • Fidelity enhancements • Sequencing 107 bp/$ ($1K/human)
Optimal Combinatorial Biology & Genome Engineering BU BME retreat 23-Jun-2004 9:45-10:30 Seacrest, N. Falmouth, MA Thanks to: DOE-GTL, DARPA-BioComp, NIGMS-CECBSR, NGHRI-CEGS, PhRMA, EU-MolTools, NHLBI-PGA, Broad Inst., Lipper Foundation Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu
Improve DNA Synthesis accuracy Synthesis on a chip pools of "construction" ~50-mers and two complementary "selection" ~26-mers (Left & Right) 10 50 10 => ss-70-mer (chip) => ds/ss-50-mer (amplif/restrict) 10 26 10 => ss-56-mer (chip) => ss-76-mer (amplif/avidin) Biotin 20-mer PCR primers(one biotinylated) Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
Improve DNA Synthesis Accuracyvia D-HPLC or MutS Smith & Modrich (1997) PNAS 94: 6847–50. Removal of polymerase-produced mutant sequences from PCR products. MutHLS Cleaves at GATC near mismatches. Lowers error rate from 6e-6 to 6e-7. Bellanne-Chantelot et al. (1997) Mutat Res. 382:35-43. Search for DNA sequence variations using a MutS-based technology. Mulligan & Tabone (2002) US Patent 6,664,112. Methods for improving the sequence fidelity of synthetic doublestranded-oligonucleotides.