200 likes | 632 Views
Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Johnson et al (Science 2007) Presented by Leo J. Lee. Outline. Background on ChIP based methods to study protein-DNA interactions Salient features of ChIPSeq Overview of the experimental protocol
E N D
Genome-Wide Mapping of in Vivo Protein-DNA Interactions Johnson et al (Science 2007) Presented by Leo J. Lee CSC 2417
Outline • Background on ChIP based methods to study protein-DNA interactions • Salient features of ChIPSeq • Overview of the experimental protocol • Data analysis pipeline used in the paper • Important biological findings/contributions • General discussions CSC 2417
Protein-DNA interaction • DNA is the information carrier of almost all living organisms. • Protein is the major building block of life. • Interaction between DNA and protein play vital roles in the development and normal function of living organisms, and disease if something goes wrong. • An important mechanism of protein-DNA interaction is via direct binding, i.e., a protein binds to a particular fragment of the DNA. CSC 2417
Chromatin Immunoprecipitation (ChIP) • ChIP is a method to investigate protein-DNA interaction in vivo. • The output of ChIP is enriched fragments of DNA that were bound by a particular protein. • The identity of DNA fragments need to be further determined by a second method. CSC 2417
ChIP-chip (or ChIP-on-chip) • ChIP-chip uses microarray technology to determine the identity of DNA fragments produced by ChIP. • Typically a control sample (genomic DNA without going through ChIP) is used to properly define relative enrichment of specific sequences in the ChIP DNA. • It is the dominant high-throughput technique before the arrival of ChIPSeq. CSC 2417
ChIPSeq Workflow ChIP Size Selection(200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome CSC 2417
ChIPSeqvs. ChIP-chip • The experimental design of ChIPSeq is considerably simpler. • ChIPSeq typically can achieve higher genomic coverage than ChIP-chip (also depends on read length vs. probe length). • The data from ChIPSeq is arguably cleaner and easier to process. • Costs are comparable (?). CSC 2417
Nice things about NRSF (REST) • Considerable knowledge on NRSF has been accumulated from previous studies, which provides a set of true positives and negatives. • Yet there is still room to make new discoveries, as illustrated in the paper. • The DNA motif bound by NRSF (called NRSE) is long and well-specified. • There is a high-quality antibody that recognizes NRSF efficiently. CSC 2417
ChIPSeq Workflow ChIP Size Selection(200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome CSC 2417
Sequence Mapping & Filtering • Only sequence reads mapped to a unique position on the human genome are kept (about 50%). • Two mismatches were allowed to accommodate polymorphism (and sequencing error). • The resulting sequence read distributions are processed by a peak locator algorithm to find the local concentration of sequence hits and its peak. • A minimum five fold enrichment over the control sampled is required. CSC 2417
ChIPSeq Peak Locator Algorithm • Merge enriched regions within 500bp of one another. • Apply a triangular 5-point smoothing and identify the peak as the coordinate with the greatest number of overlapping reads. CSC 2417
Selecting a read count threshold • A ROC curve was obtained by analyzing true positives and negatives. • A sequence read threshold of 13 was selected to reach 98% specificity and 87% sensitivity. CSC 2417
Precision of ChIPSeq • Evaluated against the center of high-scoring canonical NRSE motifs. • 94% of these strong motifs fall within 50bp of the called experimental peak. CSC 2417
Comprehensiveness of ChIPSeq • Virtually all strong canonical NRSE motif instances are detectably occupied. • Most of the sites previously studies by transfection analysis are also detected. CSC 2417
Motif Visualization CSC 2417
Motif Discovery • Two new kinds of motifs are discovered: • A noncanonical motif with variable spacing between the left and right half sites of the canonical motif • Half-site motifs • The enrichment of both kinds of motifs are highly statistically significant. • The authors are able to tell a nice evolutionary story about them. CSC 2417
GO enrichment analysis • As expected, NRSF-bound loci are highly enriched in gene ontology (GO) terms related to neurons and their development. • A group of genes encoding transcription factors that are critical in driving islet cell development in pancreas are newly discovered. • Sequence counts for this group are modest but comfortably above the threshold of 13. • The authors are able to provide strong arguments on the significance of this discovery. CSC 2417
Discussions What makes this a Science paper? CSC 2417