40 likes | 197 Views
Challenges in Modeling and Analyzing DNA Methylation Data. Shili Lin Department of Statistics The Ohio State University. Challenges. How to assign/distribute a read to CpG sites to obtain inferred nucleotide resolution methylation (NTRM) data?
E N D
Challenges in Modeling and Analyzing DNA Methylation Data Shili Lin Department of Statistics The Ohio State University
Challenges • How to assign/distribute a read to CpG sites to obtain inferred nucleotide resolution methylation (NTRM) data? • How to combine correlated signals (modeling needed) form NTRM within a well-defined region to detect differential methylation?
How to Detect Differentially Methylated Regions? • Averaging over the region will likely wash out the signals. • Point by point analysis is not powerful and will likely lead to inconsistent signals throughout the region. • Regional joint analysis needs to take correlation into account. How to model such correlation? • The two-step idea (derive NTRM data then perform methylation analysis) does not take into account of uncertainty in first step. One-step analysis strategies?
Challenges • With Methyl-CpG (Serre 2010) or MethylCap-seq (Yan 2012), NTRM are not observable but need to be inferred. • Can no longer perform Fisher exact or chi-square test on 2 x 2 contingency table (columns: two alleles of a SNP; rows: counts of methylated and unmethylatedcytosines at CpG site located on a SNP-containing read). • If data from multiple cell lines are available, can the data be pooled (modeling needed) to increase statistical power? Can cell lines with homozygous genotypes be included in the analysis? (Poisson-binomial model? – seen in a talk)