1 / 20

Differential Principal Component Analysis (dPCA) for ChIP-seq

Differential Principal Component Analysis (dPCA) for ChIP-seq. Hongkai Ji ( hji@jhsph.edu ) Department of Biostatistics The Bloomberg School of Public Health Johns Hopkins University. Functional Genomics. Locations and Functions.

jaimin
Download Presentation

Differential Principal Component Analysis (dPCA) for ChIP-seq

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Differential Principal Component Analysis (dPCA) for ChIP-seq Hongkai Ji (hji@jhsph.edu) Department of Biostatistics The Bloomberg School of Public Health Johns Hopkins University

  2. Functional Genomics Locations and Functions Maston, Evans & Green, Annu Rev Genomics Hum Genet, 2006, 7: 29-59

  3. ChIP-seq Transcription Factor (TF) Gene motif

  4. Motivation: how to compare multiple ChIP profiles between two biological conditions? Cell Type 1 Cell Type 2

  5. Data Structure Cell Type 2 Cell Type 1 Marker 1 (H3K4me3) Marker 1 (H3K4me3) Marker 2 (H3K27me3) Marker 2 (H3K27me3) Marker M (Myc) Marker M (Myc) … … Rep K1 Rep K2 Rep K1 Rep K2 Rep K1 Rep K2 Rep 1 Rep 1 Rep 1 Rep 1 Rep 1 Rep 1 … … … … … … Intensities for locus g, marker m, replicate k: xgmk ~ G(x; μ1gm, σ2) Intensities for locus g, marker m, replicate k: ygmk ~ G(x; μ2gm, σ2) Locus 1 Locus 2 … Locus G

  6. Modeling True Difference 0 * 0 0 0 0 * 0 0 0 0 . 0 * * . 0 * 0 . * 0 0 0 * 0 0 0 * 0

  7. Bayesian Perspective 0

  8. 0 * 0 0 0 0 * 0 0 0 0 . 0 * * . 0 * 0 . * 0 0 0 * 0 0 0 * 0 Goals of Analysis 1. Estimate … 2. Infer 0 * 0 0 0 0 * 0 0 0 0 . 0 * * . 0 * 0 . * 0 0 0 * 0 0 0 * 0 (2.a) Rank loci according to each component (based on ugi); (2.b) Test ugi = 0?

  9. Example: K562 vs. Huvec ENCODE Data G = 138,328 MYC motif sites in human genome; M = 18 data sets.

  10. Biological meaning of PCs

  11. PC1 predicts MYC differential binding better than using each marker individually

  12. Example: K562 vs. Huvec ENCODE Data G = 138,328 MYC motif sites in human genome; M = 25 data sets. PC1: 50% FDR<5%: 65252 H3K27me3 H3K36me3 H4K20me1 H3K27me3 H3K36me3 H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K4me3 H3K27ac H3K9ac DNase FAIRE CTCF CTCF Jun Max Pol2 Input Input CTCF Input Input Pol2 PC2: 14% FDR<5%: 47960

  13. Other Examples

  14. Implications TF Cell type 1 TF Cell type 2

  15. Example: K562 vs. Huvec ENCODE Data G = 24376 human promoters; M = 16 markers. H3K27me3 H3K36me3 H4K20me1 H3K27me3 H3K36me3 H3K4me1 H3K4me2 H3K4me3 H3K9me1 H3K27ac H3K4me3 H3K9ac CTCF Input CTCF Input

  16. PC1 predicts RNA-seq differential expression Cor = 0.6615

  17. False Discovery Rate (FDR) 0 * 0 0 0 0 * 0 0 0 0 . 0 * * . 0 * 0 . * 0 0 0 * 0 0 0 * 0

  18. Simulation

  19. Simulation

  20. Summary • dPCA provides a way to concisely summarize differences between two cell types. • Differential genes along the major PC have biological meaning. • Future directions include modeling the signal shapes, multiple conditions, non-linearity, and establishing convergence rate.

More Related