1 / 18

Genome-Wide Mapping of in Vivo Protein-DNA Interactions

This presentation outlines the background and methodology of ChIP (Chromatin Immunoprecipitation) based methods to study protein-DNA interactions. It specifically compares ChIPSeq, a high-throughput technique using next-generation sequencing, to ChIP-chip, a microarray-based technique. The paper discusses the important biological findings and contributions of ChIPSeq and highlights the advantages of this method over ChIP-chip. Additionally, it showcases the discovery of new DNA motifs and gene ontology enrichment analysis using ChIPSeq.

emorrell
Download Presentation

Genome-Wide Mapping of in Vivo Protein-DNA Interactions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome-Wide Mapping of in Vivo Protein-DNA Interactions Johnson et al (Science 2007) Presented by Leo J. Lee CSC 2417

  2. Outline • Background on ChIP based methods to study protein-DNA interactions • Salient features of ChIPSeq • Overview of the experimental protocol • Data analysis pipeline used in the paper • Important biological findings/contributions • General discussions CSC 2417

  3. Protein-DNA interaction • DNA is the information carrier of almost all living organisms. • Protein is the major building block of life. • Interaction between DNA and protein play vital roles in the development and normal function of living organisms, and disease if something goes wrong. • An important mechanism of protein-DNA interaction is via direct binding, i.e., a protein binds to a particular fragment of the DNA. CSC 2417

  4. Chromatin Immunoprecipitation (ChIP) • ChIP is a method to investigate protein-DNA interaction in vivo. • The output of ChIP is enriched fragments of DNA that were bound by a particular protein. • The identity of DNA fragments need to be further determined by a second method. CSC 2417

  5. ChIP-chip (or ChIP-on-chip) • ChIP-chip uses microarray technology to determine the identity of DNA fragments produced by ChIP. • Typically a control sample (genomic DNA without going through ChIP) is used to properly define relative enrichment of specific sequences in the ChIP DNA. • It is the dominant high-throughput technique before the arrival of ChIPSeq. CSC 2417

  6. ChIPSeq Workflow ChIP Size Selection(200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome CSC 2417

  7. ChIPSeqvs. ChIP-chip • The experimental design of ChIPSeq is considerably simpler. • ChIPSeq typically can achieve higher genomic coverage than ChIP-chip (also depends on read length vs. probe length). • The data from ChIPSeq is arguably cleaner and easier to process. • Costs are comparable (?). CSC 2417

  8. Nice things about NRSF (REST) • Considerable knowledge on NRSF has been accumulated from previous studies, which provides a set of true positives and negatives. • Yet there is still room to make new discoveries, as illustrated in the paper. • The DNA motif bound by NRSF (called NRSE) is long and well-specified. • There is a high-quality antibody that recognizes NRSF efficiently. CSC 2417

  9. ChIPSeq Workflow ChIP Size Selection(200-700bp for Exp 1; 150-300bp for Exp 2) Solexa Sequencing Mapping onto Genome CSC 2417

  10. Sequence Mapping & Filtering • Only sequence reads mapped to a unique position on the human genome are kept (about 50%). • Two mismatches were allowed to accommodate polymorphism (and sequencing error). • The resulting sequence read distributions are processed by a peak locator algorithm to find the local concentration of sequence hits and its peak. • A minimum five fold enrichment over the control sampled is required. CSC 2417

  11. ChIPSeq Peak Locator Algorithm • Merge enriched regions within 500bp of one another. • Apply a triangular 5-point smoothing and identify the peak as the coordinate with the greatest number of overlapping reads. CSC 2417

  12. Selecting a read count threshold • A ROC curve was obtained by analyzing true positives and negatives. • A sequence read threshold of 13 was selected to reach 98% specificity and 87% sensitivity. CSC 2417

  13. Precision of ChIPSeq • Evaluated against the center of high-scoring canonical NRSE motifs. • 94% of these strong motifs fall within 50bp of the called experimental peak. CSC 2417

  14. Comprehensiveness of ChIPSeq • Virtually all strong canonical NRSE motif instances are detectably occupied. • Most of the sites previously studies by transfection analysis are also detected. CSC 2417

  15. Motif Visualization CSC 2417

  16. Motif Discovery • Two new kinds of motifs are discovered: • A noncanonical motif with variable spacing between the left and right half sites of the canonical motif • Half-site motifs • The enrichment of both kinds of motifs are highly statistically significant. • The authors are able to tell a nice evolutionary story about them. CSC 2417

  17. GO enrichment analysis • As expected, NRSF-bound loci are highly enriched in gene ontology (GO) terms related to neurons and their development. • A group of genes encoding transcription factors that are critical in driving islet cell development in pancreas are newly discovered. • Sequence counts for this group are modest but comfortably above the threshold of 13. • The authors are able to provide strong arguments on the significance of this discovery. CSC 2417

  18. Discussions What makes this a Science paper? CSC 2417

More Related