170 likes | 322 Views
Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar gangfang@cs.umn.edu http://www-users.cs.umn.edu/~kumar/dmbio/. Department of Computer Science and Engineering.
E N D
Subspace Differential Coexpression Analysisfor the Discovery of Disease-related DysregulationsGang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumargangfang@cs.umn.eduhttp://www-users.cs.umn.edu/~kumar/dmbio/ Department of Computer Science and Engineering 15th PSB 01/08/2010
Differential Expression (DE) • Differential Expression (DE) • Traditional analysis targets the changes ofexpression level cases controls Expression level Expression over samples in controls and cases [Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.
Differential Coexpression (DC) Targets changes of the coherence of expression Differential Coexpression (DC) controls cases controls cases Question: Is this gene interesting, i.e. associated w/ the phenotype? Answer:No, in term of differential expression (DE). However, what if there are another two genes ……? genes Matrix of expression values Yes! Expression over samples in controls and cases [Kostka & Spang, 2005] Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc. [Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.
Existing work on differential coexpression Pairs of genes with differential coexpression [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004] Clustering based differential coexpression analysis [Ihmels et al., 2005], [Watson., 2006] Network based analysis of differential coexpression [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006], [Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008] Beyond pair-wise (size-k) differential coexpression [Kostka and Spang., 2004], [Prieto et al., 2006] Gene-pathway differential coexpression [Rosemary et al., 2008] Pathway-pathway differential coexpression [Cho et al., 2009] Differential Coexpression (DC)
Existing DC work is “full-space” • Full-space differential coexpression • May have limitations due to the heterogeneity of • Causes of a disease(e.g. genetic difference) • Populations affected(e.g. demographic difference) Full-space measures: e.g. correlation difference Motivation: Such subspace patterns may be missed by full-space models
Definition of Subspace Differential Coexpression Pattern A set of k genes = {g1, g2 ,…, gk} : Fraction of samples in class A, on which the k genes are coexpressed : Fraction of samples in class B, on which the k genes are coexpressed Extension to Subspace Differential Coexpression Problem: given n genes, find all the subsets of genes, s.t. SDC≥d as a measure of subspace differential coexpression Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]
Computational Challenge Problem: given n genes, find all the subsets of genes, s.t. SDC≥d Given n genes, there are 2n candidates of SDC pattern! How to effectively handle thecombinatorial search space? Similar motivation and challenge as biclustering, but here differentialbiclustering !
Direct Mining of Differential Patterns Refined SDC measure: “direct” >> A measure M is antimonotonic if V A,B: A B M(A) >= M(B) ≈ Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] [Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN]
An Association-analysis Approachsystematic and efficient combinatorial search Refined SDC measure Disqualified A measure M is antimonotonic if V A,B: A B M(A) >= M(B) Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency Prune all the supersets [ Agrawal et al. 1994]
Validation • Three lung cancer datasets • [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] • All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A) • Lung cancer samples & normal samples • Combined dataset • More samples • Proper normalizations before combining: (RMA, DWD, XPN) • Lung cancer samples (102) • normal samples (67) • RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]
Statistical Significance Phenotype permutation test (n=1000 ) C B A
Could Subspace DC patterns have been discovered in full-space? 88 statistically significant size-3 patterns (stars) Can NOT be found in full-space Can also be found in full-space Subspace DC measures Phenotype permutation based significant cutoff for the full-space measure Full-space DC measures DC (Differential Coexpression)
A 10-gene Subspace DC Pattern Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5) Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer ≈ 60% ≈ 10% www. ingenuity.com: enriched Ingenuity subnetwork
Specific interpretation Enriched cancer-related signaling pathways TNF-α/NFkB WNT Target gene sets of cancer-related microRNA & TFs microRNA: miR-101({PIK3C2B,TSC22D1} + AKAP12) Transcriptional factor (TF): ATF2({ETV4,PTHLH} + CBX5) Biological Interpretations miR-101 is shown down-regulated in cancer [Friedman et al 2009] Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002]
Summary & Future Directions • Summary • Proposed the problem definition & a systematic approach for subspace DC • Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space • Potential Biomedical utility • Study the demographic and genetic difference within each class • Phenotype classification with subspace DC patterns • Combine DE and Subspace DC patterns • Other types of data, e.g. SNP, metabolites, etc. Compare Compare DE (Differential Expression); DC (Differential Coexpression)
Co-authors at Dept. Computer Science, Univ. of Minnesota Conference organizers NLM/NIH travel award Acknowledgement Data Mining for Biomedical Informatics Group Comp. Bio. Group Comp. Bio. & Func. Genomic Group Michael Steinbach Rui Kuang Chad Myers Gaurav Pandey Vipin Kumar NSF grants #IIS0916439 #CRI-0551551 #IIS-0308264 #ITR-0325949 UMR, IBM, Mayo Clinic for BICB Fellowship
Thanks! • Paper • Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General ApproachProceedings of 15th Pacific Symposium on Biocomputing, 2010 • Source codes: http://vk.cs.umn.edu/SDC • Questions: • Gang Fang: gangfang@cs.umn.edu