340 likes | 552 Views
Single Cell, RNA, & Chromosome Sequencing Technologies. George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies. Thanks to: NCI/NIH HMS-CGCC. AppliedBiosystems-Agencourt , Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices.
E N D
Single Cell, RNA, & Chromosome Sequencing Technologies George Church 2:30- 3:00 PM Tue 3-Oct-2006 Cancer Genomics & Emerging Technologies Thanks to: NCI/NIH HMS-CGCC AppliedBiosystems-Agencourt, Affymetrix, Helicos, 454, Solexa, DNAdirect, CompleteGenomics, Codon Devices
Muliplex Polony Summary • Technologies for selecting genomic regions • Mbp scale for rearrangements • RNA tags & spliceforms • 1 to 200 bp scale for SNPs & exons (1%) • Low cost & high accuracy : $.07/kbp at 3E-7 errors • Paired-end-tags (PET) for rearrangements • Detection of rare mutations (e.g. drug resistance alleles) • 60 million reads per run
Selective genome sequencing • Numerous (100K) Small Regions (exons & point mutations) • PCR : 21 Mbp >$250K Sjoblom et al (2006) Science • Highly multiplexed molecular inversion probe genotyping: • over 10,000 targeted SNPs genotyped in a single tube assay. • Hardenbol et al. Genome Res. 2005 Feb;15(2):269-75. • Analyzing genes using closing and replicating circles. • Nilsson et al. (2006) Trends Biotechnol 24:83. • One large region • Single molecule amplification 1 to 4 Mbp • Zhang et al. 2006 Nature Biotech. 24:680 • Direct genomic [BAC hybridization] selection. [50% pure] • Bashiardes et al (2005) Nat Methods 2: 63.
In vitro Paired-tag library Gap fill Cleave & ligate Red=Synthetic; Yellow=genomic Two ways to capture alleles from genomic ss-DNA Selective genome sequencing Shendure, et al. Science 309(5741):1728-32. Nilsson et al. (2006) Trends Biotechnol 24:83. How do we optimize >100K 100mers ? Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis,Nilsson
How? 10 Mbp of oligos / $1000 chip ~1000X lower oligo costs Digital Micromirror Array 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid 12K Combimatrix/Codon Electrolytic 44K Agilent Ink-jet standard reagents 380K Nimblegen/GA Photolabile 5'protection Amplify pools of 50mers using flanking universal PCR primers & 3 paths to 10X error correction Tian et al. Nature. 432:1050; Carr & Jacobson 2004 NAR; Smith & Modrich 1997 PNAS
Padlock, Molecular Inversion Probes (MIPs) CG to CA,TG 35% of germline, 44% of colorectal cancer mutations (not restricted to single nucleotides nor common polymorphisms) R Optional multiplex tag Universal primers L Genomic DNA CG CA TG Alternative alleles Zhang, Chou, Shendure, Li, Leproust, Church, Dahl, Davis,Nilsson (10K to 1M 100-mer probes per pool -- see Kun Zhang’s poster) Vitkup, Sander, Church The Amino-acid Mutational Spectrum of Human Genetic Disease. Genome Biol. 4: R72. (CG to CA, TG)
Sequencing genomes from single cells via polymerase clones -- Plones (single chromosome, cell , RNA or particle) Zhang, et al. (2006) . Nature Biotech. June ’06 1) When we only have one cell as in Preimplantation Genetic Diagnosis/Haplotyping (PGD/PGH) or environmental samples (poor lab growth) 2) Candidate chromosome region sequencing 3) Prioritizing or pooling (rare) species based on an initial DNA screen (metagenomics) 4) Multiple chromosomes in a cell or virus 5) RNA splicing 6) Cell-cell interactions (predator-prey, symbionts, commensals, parasites) Phi-29 Polymerase Stand-displacement amplification
Multiple Displacement Amplification (MDA) Single molecule amplification sequencing NBT (2006) 24: 657-8.. Note!: Single human cell 1000X easier than 5 Mbp Zhang et al., Nature Biotechnology (2006) 24:680
Single-cell sequencing: 4.7 Mbp (plones) • Ultra-clean conditions for reduction of background amplification + Real-Time monitoring • Post-amplification chip hybridization distinguishes alleles • Amplification variation random & easily filled by PCR
CD44 Counts (RNA splicing forms) Eph4 = mammary epithelial cell line Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic) Zhu, Shendure, Mitra, Church, Single Molecule Profiling of Alternative Pre-mRNA Splicing. Science 301:836-8.
Reading Polonies Beads or not, Ligase or Polymerase A G C T
‘Next Generation’ Sequencing Status fL =1E-15 liters (femto) Multi-molecule Reaction Volume AB/APG Ligase beads 1 fL 454/Roche Pol beads 100,000 fL Solexa Pol term 1 fL CGI Ligase 1 fL Affymetrix Hybr array 100 fL Single molecules Helicos Biosci Pol <1fL Visigen Biotech Pol FRET <1fL Pacific Biosci Pol <1fL Agilent Nanopores <1fL (7/9 involve our lab)
Length& run-time vs. Accuracy&Cost "Future improvements in the read lengths, demonstrated at 7 consecutive bases per tag (Shendure et al., 2005) and reductions in the run time, currently 60 hours, will make this a useful platform for resequencing." --Leamon, et al. (454) Gene Therapy and Regulation 3: 15-31 Note that without ‘future improvements’: Affymetrix/Illumina read-lengths of 1 base per tag are useful. 60 million reads/run is 10X faster per read than 500K reads/run. & 50X lower cost per bp due to lower reagent & instrument costs. $500/run $140K
Polony Sequencing Equipment CCD camera microscope with xyz controls Autosampler (96 wells) (HPLC-like) flow-cell syringe pump temperature control
In vitro paired tag libraries Monolayer immobilization SOFTWARE Images → Tag Sequences Tag Sequences → Genome SBE or SBL sequencing Integrated Polony Sequencing Pipeline(open source hardware, software, wetware) Enrich amplified beads Bead polonies via emulsion PCR Dressman et al PNAS 2003 Epifluorescence & Flow Cell $140K Shendure, Porreca, Reppas, Lin, McCutcheon, Rosenbaum, Wang, Zhang, Mitra, Church (2005) Science 309:1728.
ePCR bead 4 positions for paired-end anchor 'primers' Tag 1 Tag 2 L M R 5’ 3’ 7 bp 7 bp 6 bp 6 bp Each yields 6 to 7 bp of contiguous sequence 26 bp new sequence per 135 bp amplicon
Sequencing by Ligation (SBL) with fluorescent combinatorial 9-mers ExcitationEmission 647 700 555 605 572 630 555 700 5’-Cy5-nnnnAnnnn-3’ 5’-Cy3-nnnnGnnnn-3’ 5’-TR-nnnnCnnnn-3’ 5’-Cy3+Cy5-nnnnTnnnn-3’ nm 5'PO4 ACUCAUC… (3’)…TAGAGT????????????????TGAGTAG…(5’) Shendure, Porreca, et al. (2005) Science 309:1728
Why low error rates? Goal of genotyping & resequencing Discovery of variants e.g. cancer somatic mutations 4E-6 (&lab-evolved cells) Consensus error rateTotal errors(E.coli)(Human) 1E-4 Bermuda/Hapmap 500 600,000 4E-5 454 200 240,000 3E-7 Polony-SbL @6X 0 1800 1E-8 Goal for 2006 0 60 Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
Microbial lab evolution Lenski Citrate utilization Church Trp/Tyr exchange Palsson Glycerol utilization Edwards Radiation resistance Ingram Lactate production Stephanopoulos Ethanol resistance Marliere Thermotolerance J&J Diarylquinoline resistance (TB) DuPont 1,3-propanediol production
Polony-based Whole-Genome Mutation Discovery of DTrp clone ompF – non specific transport channel • Glu-117 → Ala (in the pore) • Charged residue known to affect pore size and selectivity • Can increase import & export capability simultaneously Shendure, et al. (2005) Science309:1728
Evolving Population: Multiple Genotypes, Similar Themes PCR amplification and sequencing of OmpF and Lrp from multiple clones from 3 independent lines of Trp/Tyr co-cultures: • OmpF: 42R G, L, C, 113 DV, 117 EA Arg Gly, Leu, Cys ; Asp Val; Glu Ala Hydrophillic and bulky hydrophobic and smaller • Promoter: -12AC, -35 CA More consensus like • Lrp: 1bp deletion, 9bp deletion, 8bp deletion, IS2 insertion, • R->L in DBD. Change in global gene regulation? Heterogeneity within each time-point reflects colony heterogeneity. Reppas, Lin, et al (unpublished)
Mixture of wild & 2kb Inversion (pin) proximal tag placement Incorrect distance Red=same strand Black opposite strand distal tag placement 1,206k 1,210k Using paired ends, rearrangement & copy-number detection is >1000X easier than point mutation detection (6X vs 6000X)
Polonies for human inversions >300 kbp long inverted repeats Turner, Hurles, et al. 2006 Nat Methods 3:439-45. Sanger Inst. & HMS
Sequencing/genotyping on single human chromosomes Polonies for haplotyping, recombination, LOH 153Mbp Zhang et al. Nature Genet. Mar 2006
Monitoring resistance to BCR-ABL-kinase inhibitors with polonies during CML patient therapy Nardi, Raz, Chao, Wu, Stone, Cortes, Deininger, Church, Zhu, Daley (submitted) M244V T315I E255K
Muliplex Polony Summary • Technologies for selecting genomic regions • Mbp scale for rearrangements • RNA tags & spliceforms • 1 to 200 bp scale for SNPs & exons (1%) • Low cost & high accuracy : $.07/kbp at 3E-7 errors • Paired-end-tags (PET) for rearrangements • Detection of rare mutations (e.g. drug resistance alleles) • 60 million reads per run
Polonies with & without beads or gels Increases from 14 to 57 million polony beads per run & improves data quality. Kim, Porreca, Seidman, Church unpublished
Why low error rates? Goal of genotyping & resequencing Discovery of variants e.g. cancer somatic mutations 4E-6 (&lab-evolved cells) Consensus error rateTotal errors(E.coli)(Human) 1E-4 Bermuda/Hapmap 500 600,000 4E-5 454 @40X 200 240,000 3E-7 Polony-SbL @6X 0 1800 1E-8 Goal for 2006 0 60 Also, effectively reduce (sub)genome target size by enrichment for exons or common SNPs to reduce cost & # false positives.
AB3730454 Sep05 PolonySep05 Sep 06 $/kb@4E-5 $7 $9 0.8 0.07 $/3e9@1X 3M 300K $30K Paired ends yes no yes Device $ 365K 400K 140K Cost vs consensus error rate
Cancer exon sequencing $250Kper sample (13,023 genes, 21 Mbp, 135,483 primer pairs) using PCR & capillary sequencing. $3Kper sample (estimate) using single tube capture & polonies Sjoblom et al. The Consensus Coding Sequences of Human Breast and Colorectal Cancers. Science. 2006 Sep; Davies et al. Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 2005 65:7591-5.