1 / 33

Motif-directed Network Component Analysis for Regulatory Network Inference

Motif-directed Network Component Analysis for Regulatory Network Inference. Chen Wang, Lily Chen, Yue Wang, (Jason) Jianhua Xuan* Virginia Tech, USA Po Zhao, Eric Hoffman Children’s National Medical Center, USA Robert Clarke Georgetown University Medical Center, USA. Outline.

abrahamm
Download Presentation

Motif-directed Network Component Analysis for Regulatory Network Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motif-directed Network Component Analysis for Regulatory Network Inference Chen Wang, Lily Chen, Yue Wang, (Jason) Jianhua Xuan* Virginia Tech, USA Po Zhao, Eric Hoffman Children’s National Medical Center, USA Robert Clarke Georgetown University Medical Center, USA

  2. Outline • Background & Motivation • Proposed Approach • Motif-directed network component analysis (mNCA) • Stability analysis • Experimental Results • Muscle regeneration • Conclusion & Discussion

  3. Background & Motivation • High-throughput biological data (e.g., microarray data, proteomic data, etc.) provide us a great opportunity to study genome systems. • Identify gene modules, interactions and pathways. • Gene regulatory network modeling • Clustering or biclustering • Decomposition • The whole gene population is regulated by a few key transcription factors (TFs). • TFs and their interactions can form a skeleton of the regulatory networks.

  4. Background • However, decomposition methods relying on microarray data alone often make their results difficult to interpret biologically. • Independent Component Analysis (ICA), and • Non-negative Matrix Factorization (NMF). • Network Component Analysis (NCA) – An integrative approach • Microarray gene expression data • Protein binding data (i.e., ChIP-on-chip data) – network connections (topology) • Available in yeast model system

  5. Motivation • Limitations of NCA: • ChIP-on-chip data are often not available for species like mouse and human; • When different data sources are integrated, the consistency is often not guaranteed; • ChIP-on-chip data come from biological experiments, which might contain false-positives leading to incorrect network inference. • Proposed solution - motif-directed network component analysis (mNCA) • Motif information derived from DNA sequence for initial network topology. • With the awareness of false-positives in motif information, stability analysis procedures shall be developed to combat the inconsistency between motif information and microarray data.

  6. Motivation - Pathway Building • Emery Dreifuss Muscular Dystrophy (EDMD) Bakay, M, et al., Brain (129), 2006

  7. Network Component Analysis (NCA) TFConnection mRNA TFConnection mRNA

  8. Mathematical Formulation of NCA • A linear model: A: the connection strengths T: transcription factor activities (TFAs) Criterion to infer TFAs and regulation relationship according to both expression and topology: CHRNG = a1 MYOD1 + a2 MYOG

  9. Illustration of NCA Transcription Factor Activities (TFAs) Microarray data Regulation strength E = AT = gene

  10. Gene 1 Gene 2 Gene 3 Gene 4 Gene 5 Binding sites for a TF mNCA - Motif Information • Transcription Factors (TFs) • Proteins that bind to the promoter regions of genes • Activate or inhibit gene expression. • Motif (DNA sequence motif) • Common pattern in binding sites for a TF • Short sequences (5-25 bp) • Up to 1000 bp (or farther) from the gene • Inexactly repeating patterns

  11. Motif Representation • Consensus sequence MyoD (M00001): SRACAGGTGKYG • Position-Weighted Matrices (PWMs) MyoD (M00001): • Sequence Logo: • graphical depiction of a profile • conservation of elements in a motif MyoD (M00001):

  12. Motif Identification • Input: • Promoter region of a gene g (2000bp upstream) • Muscle specific binding site s • Match™ search algorithm • Minimize false positives [Kel, A.E., et al., ucleic Acids Res, 2003. 31(13): p. 3576-9.] • Output: • Initial connection strength – motif score • : average scores of matrix similarity and core similarity

  13. Stability Analysis for mNCA • The information sources: • mRNA Microarray data (specific but noisy) • motif information (general & with false positives) • The questions we want to answer: • What TFs play a relevant role in the experiment? • What genes are regulated by a particular TF? (downstream targets) • Stability analysis: If small perturbations being applied, • A bad TFA estimate tends to be altered easily, even destroyed; • A good TFA estimate tends to keep its activity pattern throughout the perturbation..

  14. Testing Stability by Perturbations • Method 1: Theresholding the motif score • A TF-gene connection is deleted if the motif score is below some cut-off threshold. By setting different cut-off thresholds, we can change the number of connections, hence, change the network topology accordingly. • Method 2: Deleting/inserting connections • TF-gene connections are altered randomly, either by deleting the existing connections or inserting new connections with some small percentage (e.g., 10%).

  15. Understanding of Stability Analysis • Obtain the confidence measure of an estimate: e.g. absolute correlation coefficient: 0.92; highly confident comparison e.g. absolute correlation coefficient: 0.52; less confident perturbation

  16. Stability Measurement • Stability measurements from perturbations: 75% Quantile Median 25% Quantile Boxplot of the stability measurements

  17. Experimental Results • Dataset Description: Staged skeletal muscle degeneration/regeneration was induced by injection of cardiotoxin (CTX). In the time range up to 40 days, 27 time points were sampled, and each time sample contains two mice duplicates. The time course microarray data set was acquired with Affymetrix’sMurine Genome U74v2 Set from an expression profiling study in Children’s National Medical Center (CNMC). We obtained expression measurements of 7570 probesets in each sample. … … 40 30 1 14 16 20 2 3 4 5 11 12 13 0.5 10 (day)

  18. Muscle Related TFs • 24 Muscle related TF binding sites from TRANSFAC:

  19. Muscle Related TFs • Some muscle related TF binding sites from TRANSFAC:

  20. Stability Analysis (Method I) • Thresholding the motif score: • The threshold of motif score was set from low to high, making the connection number vary gradually from 12,000 to 18,000, which results in more than 30% topology alterations. YY1 MyoD myogenin

  21. Stability Analysis (Method II) • Deleting or inserting connections: • For each transcription factor, 10% of connections were altered randomly regardless of the motif score, by deleting existing connections or inserting new connections to test the stability of TFA estimates. YY1 MyoD myogenin

  22. Stable TFA Estimates • The most stable TFA - YY1: • Observed expression is of almost no change; • Estimated TFA is muscle regeneration related. YY1’s gene expression (probe id: 98767_at) Estimated YY1’s TFA

  23. YY1’s TFA Estimate • The difference between YY1’s mRNA level and protein level is supported by biological experiments. Walowitz, JL, et al., “Proteolytic Regulation of the Zinc Finger Transcription Factor YY1, a Repressor of Muscle-restricted Gene Expression ,”J Biol Chem, Vol. 273, Issue 12, 6656-6661, March 20, 1998. YY1 expression level YY1 protein level

  24. YY1 – A Repressor in Muscle Regeneration • Underlying regulation mechanism: Calpain II’s gene expression (probe id: 101040_at) Estimated YY1’s TFA YY1 targets YY1 Calpain II

  25. Stable TFA estimates • Some other stable TFAs - myogenin & MyoD MyoD (probe id: 102986_at) myogenin (probe id: 103053_at) Expression Estimated TFA

  26. Identifying TF’s Downstream Targets • Stability Analysis: • Similarly, we can test the stability of regulation strength A with small perturbations, hence to rank the most likely targets of a specific TF. • Ranking downstream targets by frequency count (confidence measure): • Perform multiple independent perturbations by deleting a connection with some probability. • Count how many times a TF-gene regulation strength is in the top rank group (defined by some preset threshold), based on its regulation strength A.

  27. Stability Analysis of MyoD’s Targets • MyoD’s downstream targets ranking: • 1000 independent perturbations are carried out. • Each connection is deleted with a probability (e.g., 0.3). • The top ranking threshold is set to 100 in this case. if one gene’s regulation strength by MyoD is in the top 100, then this gene is counted for once.

  28. MyoD’s Downstream Targets • MyoD’s downstream genes from Ingenuity Pathway Analysis: Top 100 genes: 16 directly related genes with MyoD, and several key muscle regeneration TFs: MYC, MYOG, and MEF2C

  29. YY1’s Downstream Targets • YY1’s downstream genes from Ingenuity Pathway Analysis:

  30. Conclusions • A new computational approach, namely motif-directed network component analysis (mNCA), has been developed to integrate motif information and microarray data for regulatory network inference. • Motif information has been utilized to derive the initial topology information for mNCA. • With the awareness of many false-positives in motif information, stability analysis procedures have been developed to extract stable TFAs and TFs’ downstream targets. • The experimental results have demonstrated that mNCA can help reveal key regulators in muscle regeneration.

  31. Future Work – New Hypothesis & Validation • Integrative approaches to pathway building myogenin CYBB MCM5 ……… RRM1 MyoD c-Myc YY1 Calpain II MYL4 TNNC1 MYBPH PAX2 … DES DYS : interaction from database and knowledge …. : interaction derived from computational methods

  32. Acknowledgement • NIH Grants: • NS2925-13A, CA 096483 & CA109872 • DoD/CDMRP Grant • BC030280

  33. Thank you very much!

More Related