320 likes | 632 Views
Epigenetic Analysis. BIOS 691- 803 Statistics for Systems Biology Spring 2008. Kinds of Questions. Where are the epigenetic modifications? How do they co-vary? How do epigenetic changes affect expression of genes?. Covariation of Epigenetic Measures. Motivating questions
E N D
Epigenetic Analysis BIOS 691- 803 Statistics for Systems Biology Spring 2008
Kinds of Questions • Where are the epigenetic modifications? • How do they co-vary? • How do epigenetic changes affect expression of genes?
Covariation of Epigenetic Measures • Motivating questions • How are epigenetic modifications related? • What are the major determinants of epigenetic state? • Statistical techniques • Covariance calculation • Principal component analysis • Linear models
Location and Covariance • Question: do epigenetic modifiers act on specific targets or do they act on whole regions of DNA? • Direct experimental evidence contradictory • Statistics may help: • Covariation patterns may be evidence
Calcitonin A gene Two CpG clusters plus 3 odd CpG’s High correlation within clusters CalcA in NCI60
Covariation in Methylation of 7 Genes • Individual genes have multiple CpG sites • Most variation: overall methylation Correlation Map of 108 CpG sites in 6 genes across 5 ECOG pilot samples Red = 1 White = 0 Blue < 0 Epigenomic Analysis
Methylation and Expression • Single gene (E-cadherin) results suggest overall methylation correlated with expression
Methylation and Expression • HELP assay gives genome-wide sampling of methylation sites at 15K genes • If select genes with S/N > 2 in both measures, then correlations with associated genes are bi-modal Epigenomic Analysis
What Causes Methylation? • NCI-60 derived from various tissues • Tissue characteristic profile + specific history of cells • Fit linear model to each methylation site • 9 tissues for 60 observations • 51 error df • Overall 41% of variance attributable to tissue • What causes the remainder of methylation differences?
PCA for Cell-specific Factors • Residual variance has one strong PC • Remainder are ‘noise’ • 1st PC is almost constant • Reflects overall level of methylation • Is this an artifact or is it real? • Significantly correlated with expression of DNMT1 & DNMT3A
Relations Between Epigenetic Measures - III Stem Cells & Cancer
Issue: Cancer Stem Cells? • Hypothesis: cancers arise from stem cells rather than differentiated epithelial cells • How would you tell the difference between partially differentiated stem cells and de-differentiated epithelial cells? • Proposal: compare characteristic epigenetic modifications of stem cells with cancers • Epigenetic modifications are distinct • PRC2 (stem cells) vs methylation (cancer)
Statistical Methodology • Test of association 2 x 2 table • Fisher Exact p ~ 10-5
Statistical Methodology • Test of association 2 x 2 table • Fisher Exact p ~ 10-5 • Alternatives • T-test (predictor: PRC2) • Linear model (predictor: methylation: T – N )
Are CIMP’s Stem Cell Clones? • Distinctive PRC2 sites appear preferentially methylated in CIMP tumors
Correlations between epigenetic and expression measures – I Copy Number and Expression
Copy Number and Expression • Large sections of DNA containing many genes are often copied or deleted • We think most control elements are copied or deleted also • If more (or fewer) copies of a gene then ceteris paribus there should be more (fewer) copies of RNA
Integrative Studies of CGH & Gene Expression • Expect to see strong correlation between copy number and expression in data • Previous studies report report weak effects • Average correlations from (0.04 to 0.27) • NCI 60 study average correlation 0.16
Why Not? • H1: there really isn’t much effect – biology • Somehow the cells are compensating • In any case there shouldn’t be any effect on non-expressed genes • H2: we may not be able to measure the effect that is there – technical error • Probes may be insensitive/cross-hybridizing • Signal/noise too low even when probes are sensitive
Eliminating Uninformative Genes • Genes which are silenced will not show effect of copy number variation • Mean signal a rough proxy • Remove genes with mean signal above 6.3 • Only genes with significant copy number variation (above measurement noise) will show effect • Select genes with SD of copy number > 0.5
Correlations of Selected Measures Black: All correlations Red: Reliably measured correlations
Estimating True Correlations • If measurement noise of SD ~ 0.3 degrades expression measures, then true correlations of variables will be mostly closer to 0 than correlations of measures • Given a correlation and measured standard deviations, what are most likely true standard deviations and true correlation?
MLE of Noisy Correlations • Noise can be estimated from replicates • If N large can estimate • SD of originals can be estimated by ML • Given s and e, the MLE of correlation can be inferred • For NCI 60 median MLE correlation ~ 0.65 Epigenomic Analysis
Correlations between epigenetic and expression measures – II Chromatin and Expression
Do Epigenetic Marks Regulate Transcription? • Several studies finding only weak evidence by correlation analysis • Same technical issue: S/N ratio • Questions • Does methylation shut down most genes? • Which histone marks indicate active transcription?
Methylation and Expression • HELP assay gives genome-wide sampling of methylation sites at 15K genes • Select genes with S/N > 2 in both measures • Correlations with gene expression values are bi-modal Epigenomic Analysis
Interpretation of Meth-Expr Corrs • MLE of negative mode ~ -0.8 • ~ 2/3 of genes under that hump • Unclear whether positive hump is real or an artifact of small sample size • Possible explanations: • True induction by methylation • Methylation of insulator • Irrelevant CpG site
Acetylation and Expression • Histones often acetylated during expression • Histone 3 lysine 9 (H3K9) acetylation measured • Measures corrupted by noise • Blue: S/N > 2.5 • Red: S/N > 2 • Black: S/N > 1.5
Biological Prediction • H3K9 acetylation gene expression • Is this real? • Experimental test: find genes with high acetylation variance, and little expression variance by microarray • Results (7 genes) • Confirm hypothesis • Implies: • Expression arrays are not sensitive Epigenomic Analysis