180 likes | 390 Views
ChIP-seq. Robert J. Trumbly Department of Biochemistry and Cancer Biology Block Health Science 448, UTHSC 419-383-4347 robert.trumbly@utoledo.edu. ChIP-seq.
E N D
ChIP-seq • Robert J. Trumbly • Department of Biochemistry and Cancer Biology • Block Health Science 448, UTHSC • 419-383-4347 • robert.trumbly@utoledo.edu
ChIP-seq • ChIP-seq (chromatin immunoprecipitation followed by DNA sequencing) has become the preferred method for analyzing protein-DNA interactions and chromatin structure on a genomic scale • ChIP-seq has become practical because of rapid developments in NGS (next generation sequencing)
NGS • The transition from microarrays to NGS creates not just more data but a different type of data • Microarray data are analog: how much expression (signal) for a gene? • NGS data are digital: e.g., which splicing variant is expressed?
NGS • RNA-seq: can detect splicing variants, allelic expression, novel mRNAs • ChIP-seq: can detect differential binding to allelic variants, leading to information about binding specificity
TFs: sharp binding sites RNA Pol II: sharp and extended Histone modifications: extended domains Park, Oct 2009
ChIP-seq and RNA-seq analysis Pepke et al., Nature Methods 6:S22-S32 2009
FASTQ files • @SEQ_ID • GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT • + • !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 • Output of NGS usually in FASTQ files • Line 1 @ followed by sequence id • Line 2: sequence • Line 3: +, sometimes followed by text • Line 4: quality score for each base, encoded as ASCII symbol
Quality scores • Phred quality score, Q = -10 log10p, where p = the probability that the corresponding base call is incorrect. • Example: p = 0.001, log(0.001) = -3 • Q = - 10 X -3 = 30 • For the FASTQ file, an offset of 33 (for the most common encoding) is added to the raw quality score, and the ASCII symbol corresponding to that number is stored and displayed. • There are several variations on the quality score encoding, so programs that interpret the scores must know the proper version
Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells • Chen et al., Cell 133,13 June 2008, Pages 1106–1117 • Chromatin immunoprecipitation coupled with ultra-high-throughput DNA sequencing (ChIP-seq) to map the locations of 13 sequence-specific TFs (Nanog, Oct4, STAT3, Smad1, Sox2, Zfx, c-Myc, n-Myc, Klf4, Esrrb, Tcfcp2l1, E2f1, and CTCF) and 2 transcription regulators (p300 and Suz12).
Figure 1 Genome-Wide Mapping of 13 Factors in ES Cells by Using ChIP-seq Technology TFBS profiles for the sequence-specific transcription factors and mock ChIP control at the Pou5f1 and Nanog gene loci are shown.
Figure 2 Identification of Enriched Motifs by Using a De Novo Approach Matrices predicted by the de novo motif-discovery algorithm Weeder.
ChIP-seq tutorial • Chip-seq Analysis with Galaxy: from reads to peaks (and motifs) • 2 - Obtaining the raw data: Accessing ChIP-seq reads from ArrayExpress database • 3 - Upload the reads in the Galaxy server • 4 - Some statistics on the raw data • 5 - Mapping the reads with Bowtie • 6 - Peak calling with MACS • 7 - Retrieving the peak sequences • 8 - Visualize the peak regions in UCSC genome browser • 9 - Try to identify over represented motifs • http://ngs.molgen.mpg.de/ngsuploads/Cornelius/ESGI/Chip.htm
ChIP-seq tutorial • Revision to tutorial: • Part 2, step 4: click on name of entry • Part 2, step 5: click on ENA link at bottom of page • Part 4, step 2: there is no FASTX-Toolkit for FASTQ data section, the tools here are under the general heading NGS: QC and manipulation. There is also a new FastQC:Read QC tool here that is useful.
References • For tutorial: Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells. Chen et al., Cell Volume 133, 13 June 2008, Pages 1106–1117 • The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Cock et al., Nucleic Acids Research, 2010, Vol. 38, No. 6 1767–1771. • Computation for ChIP-seq and RNA-seq studies. Pepke et al., Nature Methods SUPPLEMENT | VOL.6 NO.11s | NOVEMBER 2009 | S23. • ChIP–seq: advantages and challenges of a maturing technology. Park et al., Nature Reviews | Genetics 10 | October 2009 | 669-680. • Next-generation genomics: an integrative approach. Hawkins et al., NATURE REVIEWS | Genetics 11 | July 2010 | 477-486.