390 likes | 528 Views
Pacific Biosciences. Use a circular template to get redundant reads and so more accuracy. DNA methylation detection by bisulfite conversion. Detection of methylated adenine in Pacific Biosciences (SMRT) sequencing. IPD = average interpulse duration ratio (meth/non-meth). Template position.
E N D
Pacific Biosciences Use a circular template to get redundant reads and so more accuracy.
Detection of methylated adenine in Pacific Biosciences (SMRT) sequencing
IPD = average interpulse duration ratio (meth/non-meth) Template position
Pacific Biosciences 50,000 ZMWs (Aug., 2011), and density may climb Long reads (e.g., full molecules to determine full length splicing isoforms) Direct RNA sequencing possible. DNA methylation detectable
Agilent SureSelect RNA Target Enrichment Capture a subgenomic region of interest for economy and speed of sequencing: E.g., the entire exome (all exons w/o introns or intergeneic regions) hundreds of cancer genes a particular genomic locus Alternative: hybridize to a custom microarray. Agilent
Nimblegen (Roche) sub=-genomic DNA capture options: Beads or microarrays
Some results using DNA capture for subgenomic sequencing Targeted Capture and Next-Generation Sequencing Identifies C9orf75, encoding Taperin, as the Mutated Gene in Nonsyndromic Deafness DFNB79 Rehman et al.American Journal of Human Genetics 86, 378–388,2010
Detection of methylated C (~all in CpG dinucleotides) cytosine ----CmpG--- > ----CpG-- > ----CmpG--- > < ---G p Cm--- DS DNA Na bisulfite Heat deamination Na bisulfite Heat ----CmpG--- > ----UpG-- > PCR ----TpG-- > ----CpG-- > <--GpC--- <--ApC--- All NON-methylated Cs changed to T. Sequence and compare to deduce the methylated C’s uracil
DEEP SEQUENCING (Next generation sequencing, High throughput sequencing, Massively parallel sequencing) applications:Human genome re-sequencing (mutations, SNPs, haplotypes, disease associations, personalized medicine)Tumor genome sequencingMicrobial flora sequencing (microbiome, viruses)Metagenomic sequencing (without cell culturing)RNA sequencing (RNAseq; gene expression levels, miRNAs, lncRNAs, splicing isoforms)Chromatin structure (ChIP-seq; histone modifications, nucleosome positioning)Epigenetic modifications (DNA CpG methylation and hydroxymethylation)Transcription kinetics (GROseq; nascent RNA, BrdU pulse labeled RNA)High throughput genetics (QUEPASA; cis-acting regulatory motif discovery)Drug discovery (bar-coded organic molecule libraries) [Manocci PNAS paper]
Ke et al, and Chasin, Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 2011. 21: 1360-1374 ). Order an equal mixture of all 4 bases at these 6 positions
Quantifying extensive phenotypic arrays from sequence arrays (= QUEPASA)
Rank 6-mer ESRseq score (~ -1 to +1) 1 AGAAGA 1.0339 2 GAAGAT 0.9918 3 GACGTC 0.9836 4 GAAGAC 0.9642 5 TCGTCG 0.9517 6 TGAAGA 0.9434 7 CAAGAA 0.9219 8 CGTCGA 0.8853 :: 4086 TAGATA -0.8609 4087 AGGTAG -0.8713 4088 CGTCGC 0.8850 4089 CTTAAA -0.8786 4090 CCTTTA -0.8812 4091 GCAAGA 0.8911 4092 TAGTTA -0.8933 4093 TCGCCG 0.9113 4094 CCAGCA -0.8942 4093 CTAGTA -0.9251 4094 TAGTAG -0.9383 4095 TAGGTA -0.9965 4096 CTTTTA -1.0610 Best exonic splicing enhancers - - Worst exonic splicing enhancers, = best exonic splicing silencers -
Constitutive exons Alternativexons Pseudo exons Composite exon (from ~100,000)
What the data looks like: 15 Sequence of 36 Quality code CGCACTGTGCTGGAGCTCCCGGGGTTAACTCTAGAA abU^Vaa`a\aaa]aWaTNZ`aa`Q][TE[UaP_U] TACACTGTGCTGGAGCTCCCAACGGCAACTCTAGAA a`P^Wa`[`Wa^`X_X_XWVa^NSP]_]S^X_T\X^ CGCACTGTGCTGGAGCTCCCATGGAGAACTCTAGAA aTa`^b``baaaa^aab^YaTQLOHIa`^a``TX]] TACACTGTGCTGGAGCTCCCCTCCCAAACTCTAGAA I_`aaaa`aaaaaaa_a_^[KZIGIGZ`U`\^P^^` CGCACTGTGCTGGAGCTCCCAATAGTAACTTTAGAA aY_\abb[T\abaaa`a`bZ[HXXIZa_`_LGMS[` TATACTGTGCTGGAGCTCCCGACGTAAACTCTAGAA aba]^aa_a]`aa]_]`XWSMFGGIPX[P]X`V_Y^ TACACTGTGCTGGAGCTCCCTGGTAAAACTCTAGAA a_^a^aa`aYaaa_aY`Y_^[I]VY\`]V]R\W]VV TACACTGTGCTGGAGCTCCCAATAAAAACTCTAGAA XZababa`aZaaaaaYaYXX`baa``\\TaUa\aW` Variable region Barcoding allows multiplexing of several or many experiments at once (in one channel of a sequencer) economy. Here, two biological replicates Constant regions (peculiar to our expt.) Error 2 nt barcode (TA or CG) Experiment: 1 1 1 2 2 1+2 2 2 1 2
Next generation methods for high throughput genetic analysis: Use custom oligo libraries to construct minigene libraries (40,000, up to 60 nt long): E.g., for saturation mutagenesis to identify all exonic bases contributing to splicing (or transcription or polyadenylation, …..) Use bar codes to detect sequences missing from the selected molecules E.g., Nat Biotechnol. 2009 27:1173-5. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Patwardhan RP, Lee C, Litvin O, Young DL, Pe'er D, Shendure J. Long (200-mer) synthetic oligo library
OUTLINE OF LECTURE TOPICS COMING UPExpression and manipulation of transgenes in the laboratory 17 • In vitro mutagenesis to isolate variants of your protein/gene with desirable properties • Single base mutations • Deletions • Overlap extension PCR • Cassette mutagenesis • To study the protein: Express your transgene • Usually in E. coli, for speed, economy • Expression in eukaryotic hosts • Drive it with a promoter/enhancer • Purify it via a protein tag • Cleave it to get the pure protein • Explore protein-protein interaction • Co-immunoprecipitation (co-IP) from extracts • 2-hybrid formation • surface plasmon resonance • FRET (Fluorescence resonance energy transfer) • Complementation readout
RS1 18 RS2 RS2 RS1 Site-directed mutagenesis by overlap extension PCR PCR fragment subsequent cloning in a plasmid (or not, the PCR product itself can be used in many ways, e.g., transfection) Ligate into similarly cut vector 1 2 Cut with RE 1 and 2 Strachan and Read Human Mol. Genet.3, p.148
19 Cassette mutagenesis = random mutagenesis but in a limited region: 1) by error-prone PCR Original sequence coding for, e.g., a transcription enhancer region ---------------------------------------------------------------------------------------------------------------------- PCR fragment with high Taqpolymerase and Mn+2 instead of Mg+2 errors ------*--------*--*-**---------------*-----------*--*-------*------------------------*-*-*------------*------------*-- Cut in primer sites and clone upstream of a reporter protein sequence. Pick colonies Analyze phenotypes Sequence
20 Cassette mutagenesis = random mutagenesis but in a limited region: 2) by “doped” synthesisTarget = e.g., an enhancer element ---------------------------------------------------------------------------------------------------------------------- Original enhancer sequence -*------------------------*-*-*------------*------------*-- ------*--------*--*-**---------------*-----------*--*------ Buy 2 doped oligos; anneal OK for up to ~80 nt. Clone upstream of a reporter. Doping = e.g., 90% G, 3.3% A, 3.3% C, 3.3% T at each position Pick colonies Analyze phenotypes Sequence
21 E. coli as a host • PROs:Easy, flexible, high tech, fast, cheap; but problems • CONs • Folding (can misfold) • Sorting within the cell -> can form inclusion bodies • Purification -- endotoxins • Modifications -- not done (glycosylation, phosphorylation, etc. ) • Modifications: • Glycoproteins • Acylation: acetylation, myristoylation • Methylation (arg, lys) • Phosphorylation (ser, thr, tyr) • Sulfation (tyr) • Prenylation (farnesyl, geranylgeranyl on cys) • Vitamin C-Dependent Modifications (hydroxylation of proline and lysine) • Vitamin K-Dependent Modifications (gamma carboxylation of glu) • Selenoproteins (seleno-cys tRNA at UGA stop)
E. coli expression vectors • Promoter examples: • 1) Lac promoter (with operator)-YFG, + lac repressor (I gene): • Induce expression by inactivationof thelac repressor with IPTG or lactose • 2) As above but with a hybrid Tac promoter (tryptophan operon + lac operon): • Stronger. Use iq mutant of lac I gene, which prodices high levels of the lac repressor. • Expression regulatatable over several orders of magnitude. • 3) BAD promoter-YFG. Arabinose utilization operon. Inducible by arabinose via theendogenous araC gene for a transciptional activator. Background levels driven down by including glucose. • 4) Phage T7 promoter-YFG. Vector carries gene for T7 polymerase, under control of the lac promoter. Add IPTG or lactose to induce T7 polymerase and thence YFG. • IPTG = isoproplthiogalactoside (non-metabolizable indicer)YFG = your favorite gene
Myristoylation – myristoic acid to N-terminal glycine alpha amino group Anchors protein to memebrane.
Lysine epsilon amino group modifications mono methyl, dimethyl also Well-studied in histones, microtubules
Via seleno-cys tRNA at a UGA nonsense codon Sequence context dictates efficiency.
Gamma carboxylation of glutamic acid Binds calcium, used in coagulation proteins
27 Some alternative hosts • Yeasts (Saccharomyces , Pichia) • Insect cells with baculovirus vectors • Mammalian cells in culture (later) • Whole organisms (mice, goats, corn) (not discussed) • In vitro (cell-free), for analysis only, not preparatively(good for radiolabeled proteins, discussed later)
Some popular yeast promoters Selectable marker ori http://biochemie.web.med.uni-muenchen.de/Yeast_Biol/04 Yeast Molecular Techniques.pdf ARS = autonomously replicating sequence element
29 GAPD term’n LEU2 GAPD prom Ampr oriE Yeast Expression Vector (example) Saccharomyces cerevisiae(baker’s yeast) 2 mu seq features: yeast ori oriE = bacterial ori Ampr = bacterial selection LEU2, e.g. = Leu biosynthesisfor yeast selection 2μ = 2 micron plasmid Complementation of an auxotrophy can be used instead of drug-resistance Your favorite gene(Yfg) Auxotrophy = state of a mutant in a biosynthetic pathway resulting in a requirement for a nutrient For growth in E. coli GAPD = the enzyme glyceraldehyde-3 phosphate dehydrogenase
Vector DNA t p gfY Genomic DNA Genomic DNA HIS4 mutation- Yeast - genomic integration via homologous recombination HIS4 t p Yfg FunctionalHIS4 gene DefectiveHIS4 gene
Vector DNA Yfg Genomic DNA AOX1 gene (~ 30% of total protein) Genomic DNA Yfg 3’AOX1 AOX1p AOX1t HIS4 Double recombination Yeast (integration in Pichia pastoris) HIS4 P. pastoris-tight control-methanol induced (AOX1)-large scale production (gram quantities) AOX1t AOX1p 3’AOX1 Alcohol oxidase gene
Expression in mammalian cells Lab examples of immortal cell lines: HEK293 Human embyonic kidney (high transfection efficiency) HeLa Human cervical carcinoma (historical, low RNase) CHO Chinese hamster ovary (hardy, diploid DNA content, mutants) Cos Monkey cells with SV40 replication proteins (-> high transgene copies) 3T3 Mouse or human exhibiting ~regulated (normal-like) growth + various others, many differentiated to different degrees, e.g.: BHK Baby hamster kidney HepG2 Human hepatoma GH3 Rat pituitary cells PC12 Mouse neuronal-like tumor cells MCF7 Human breast cancer HT1080 Human fibroblastic cells with near diploid karyotype IPS induced pluripotent stem cells and: Primary cells cultured with a limited lifetime. E.g., MEF = mouse embryonic fibroblasts, HDF = Human diploid fibroblasts Common in industry: NS1 mAbs Mouse plasma cell tumor cells Vero vaccines African greem monkey cells CHO mAbs, other therapeutic proteins Chinese hamster ovary cells PER6 mAbs, other therapeutic proteins Human retinal cells
Mammalian cell expression Generalized gene structure for mammalian expression: polyA site intron Mam.prom. 3’UTR cDNA gene 5’UTR Intron is optional but a good idea
SV40 LargeT Ag (Simian Virus 40) RSV LTR (Rous sarcoma virus) MMTV (steroid inducible) (Mouse mammary tumor virus) HSV TK (low expression) (Herpes simplex virus) Metallothionein (metal inducible, Cd++) CMV early (Cytomegalovirus) Actin EIF2alpha Engineered inducible / repressible:tet, ecdysone, glucocorticoid (tet = tetracycline) Popular mammalian cell promoters
Engineered regulated expression: Tetracycline-reponsive promoters Tet-OFF (add tet shut off) Tet-OFF VP16 transcriptionactivation domain tetRdomain tTA = tet activator fusion protein: tetR = tet repressor (original role) active No tet.Binds tet operator (multiple copies)(if tet not also bound) VP16 transcriptionactivation domain tetRdomain Tet-OFF Allosteric change in conformation Tetracycline (tet), or,better, doxicyclin (dox) not active tTA gene must be in cell (permanent transfection, integrated): polyA site CMV prom. tTA cDNA (Bujold et al.)
tetRdomain VP16 tc’nact’n domain not active little transcripton (2%?, bkgd) Doxicyclin present: polyA site MIN. CMV prom. your favorite gene polyA site polyA site your favorite gene your favorite gene No doxicyclin: VP16 tc’nact’n domain tetRdomain active Plenty of transcripton RNA po l MIN. CMV prom. Tet-OFF, cont. MIN. CMV prom. Mutliple tet operator elements
Tet-ON Tetracycline-reponsive promoters Tet-ON (add tet turn on gene tetRdomain VP16 tc’nact’n domain not active Different fusion protein: Does NOT bind tet operator(if tet not bound) tetRdomain VP16 tc’nact’n domain active Tetracycline (tet), or,better, doxicyclin (dox) polyA site Full CMV prom. tTA cDNA Must be in cell (permanent transfection, integrated): commercially available (293, CHO) or do-it-yourself
polyA site polyA site polyA site your favorite gene your favorite gene your favorite gene Tet-ON MIN. CMV prom. Mutliple tet operator elements tetRdomain VP16 tc’nact’n domain not active little transcription (bkgd.) Doxicyclin absent: MIN. CMV prom. Add dox: active tetRdomain VP16 tc’nact’n domain doxicyclin active Plenty of transcripton (> 50X) RNA pol II MIN. CMV prom.