1 / 17

Department of Computer Science and Engineering

Subspace Differential Coexpression Analysis for the Discovery of Disease-related Dysregulations Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar gangfang@cs.umn.edu http://www-users.cs.umn.edu/~kumar/dmbio/. Department of Computer Science and Engineering.

Download Presentation

Department of Computer Science and Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subspace Differential Coexpression Analysisfor the Discovery of Disease-related DysregulationsGang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumargangfang@cs.umn.eduhttp://www-users.cs.umn.edu/~kumar/dmbio/ Department of Computer Science and Engineering 15th PSB 01/08/2010

  2. Differential Expression (DE) • Differential Expression (DE) • Traditional analysis targets the changes ofexpression level cases controls Expression level Expression over samples in controls and cases [Golub et al., 1999], [Pan 2002], [Cui and Churchill, 2003] etc.

  3. Differential Coexpression (DC) Targets changes of the coherence of expression Differential Coexpression (DC) controls cases controls cases Question: Is this gene interesting, i.e. associated w/ the phenotype? Answer:No, in term of differential expression (DE). However, what if there are another two genes ……? genes Matrix of expression values Yes! Expression over samples in controls and cases [Kostka & Spang, 2005] Biological interpretations of DC: Dysregulation of pathways, mutation of transcriptional factors, etc. [Silva et al., 1995], [Li, 2002], [Kostka & Spang, 2005], [Rosemary et al., 2008], [Cho et al. 2009] etc.

  4. Existing work on differential coexpression Pairs of genes with differential coexpression [Silva et al., 1995], [Li, 2002], [Li et al., 2003], [Lai et al. 2004] Clustering based differential coexpression analysis [Ihmels et al., 2005], [Watson., 2006] Network based analysis of differential coexpression [Zhang and Horvath, 2005], [Choi et al., 2005], [Gargalovic et al. 2006], [Oldham et al. 2006], [Fuller et al., 2007], [Xu et al., 2008] Beyond pair-wise (size-k) differential coexpression [Kostka and Spang., 2004], [Prieto et al., 2006] Gene-pathway differential coexpression [Rosemary et al., 2008] Pathway-pathway differential coexpression [Cho et al., 2009] Differential Coexpression (DC)

  5. Existing DC work is “full-space” • Full-space differential coexpression • May have limitations due to the heterogeneity of • Causes of a disease(e.g. genetic difference) • Populations affected(e.g. demographic difference) Full-space measures: e.g. correlation difference Motivation: Such subspace patterns may be missed by full-space models

  6. Definition of Subspace Differential Coexpression Pattern A set of k genes = {g1, g2 ,…, gk} : Fraction of samples in class A, on which the k genes are coexpressed : Fraction of samples in class B, on which the k genes are coexpressed Extension to Subspace Differential Coexpression Problem: given n genes, find all the subsets of genes, s.t. SDC≥d as a measure of subspace differential coexpression Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010]

  7. Computational Challenge Problem: given n genes, find all the subsets of genes, s.t. SDC≥d Given n genes, there are 2n candidates of SDC pattern! How to effectively handle thecombinatorial search space? Similar motivation and challenge as biclustering, but here differentialbiclustering !

  8. Direct Mining of Differential Patterns Refined SDC measure: “direct” >> A measure M is antimonotonic if V A,B: A B  M(A) >= M(B) ≈ Details in [Fang, Kuang, Pandey, Steinbach, Myers and Kumar, PSB 2010] [Fang, Pandey, Gupta, Steinbach and Kumar, TR 09-011, CS@UMN]

  9. An Association-analysis Approachsystematic and efficient combinatorial search Refined SDC measure Disqualified A measure M is antimonotonic if V A,B: A B  M(A) >= M(B) Advantages: 1) Systematic & direct 2) Completeness 3) Efficiency Prune all the supersets [ Agrawal et al. 1994]

  10. Validation • Three lung cancer datasets • [Bhattacharjee et al. 2001], [Stearman et al. 2005], [Su et al. 2007] • All are from Affymetrix microarrays (first two: HG-U95A, and the third: HG-U133A) • Lung cancer samples & normal samples • Combined dataset • More samples • Proper normalizations before combining: (RMA, DWD, XPN) • Lung cancer samples (102) • normal samples (67) • RMA [Irizarry et al., 2003], DWD [Benito et al., 2004], XPN [Shabalin et al., 2008]

  11. Statistical Significance Phenotype permutation test (n=1000 ) C B A

  12. Could Subspace DC patterns have been discovered in full-space? 88 statistically significant size-3 patterns (stars) Can NOT be found in full-space Can also be found in full-space Subspace DC measures Phenotype permutation based significant cutoff for the full-space measure Full-space DC measures DC (Differential Coexpression)

  13. A 10-gene Subspace DC Pattern Enriched with the TNF-α/NFkB signaling pathway (6/10 overlap with the pathway, P-value: 1.4*10-5) Suggests that the dysregulation of TNF-α/NFkB pathway may be related to lung cancer ≈ 60% ≈ 10% www. ingenuity.com: enriched Ingenuity subnetwork

  14. Specific interpretation Enriched cancer-related signaling pathways TNF-α/NFkB WNT Target gene sets of cancer-related microRNA & TFs microRNA: miR-101({PIK3C2B,TSC22D1} + AKAP12) Transcriptional factor (TF): ATF2({ETV4,PTHLH} + CBX5) Biological Interpretations miR-101 is shown down-regulated in cancer [Friedman et al 2009] Mutations of ATF2 are shown to be related to cancer [Woo et al. 2002]

  15. Summary & Future Directions • Summary • Proposed the problem definition & a systematic approach for subspace DC • Subspace DC analysis can identify many statistically significant & biologically relevant patterns that would have been missed in full-space • Potential Biomedical utility • Study the demographic and genetic difference within each class • Phenotype classification with subspace DC patterns • Combine DE and Subspace DC patterns • Other types of data, e.g. SNP, metabolites, etc. Compare Compare DE (Differential Expression); DC (Differential Coexpression)

  16. Co-authors at Dept. Computer Science, Univ. of Minnesota Conference organizers NLM/NIH travel award Acknowledgement Data Mining for Biomedical Informatics Group Comp. Bio. Group Comp. Bio. & Func. Genomic Group Michael Steinbach Rui Kuang Chad Myers Gaurav Pandey Vipin Kumar NSF grants #IIS0916439 #CRI-0551551 #IIS-0308264 #ITR-0325949 UMR, IBM, Mayo Clinic for BICB Fellowship

  17. Thanks! • Paper • Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers and Vipin Kumar, Subspace Differential Coexpression Analysis: Problem Definition and a General ApproachProceedings of 15th Pacific Symposium on Biocomputing, 2010 • Source codes: http://vk.cs.umn.edu/SDC • Questions: • Gang Fang: gangfang@cs.umn.edu

More Related