1 / 33

Identification of Highly Synchronized Subnetworks from Gene Expression Data

Identification of Highly Synchronized Subnetworks from Gene Expression Data. Shouguo Gao , Xujing Wang From 8 th International Symposium on Bioinformatics Research and Applications (ISBRA’12) BMC Bioinformatics 2013, 14( Supp 9):S5. Presented by Pak Kan , WONG. Contents. Motivation

idola
Download Presentation

Identification of Highly Synchronized Subnetworks from Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Highly Synchronized Subnetworks from Gene Expression Data ShouguoGao, Xujing Wang From 8th International Symposium on Bioinformatics Research and Applications (ISBRA’12)BMC Bioinformatics 2013, 14(Supp 9):S5 Presented by Pak Kan, WONG

  2. Contents • Motivation • Overview • Mathematical formulation • Experiments • Simulated study • Study on yeast data • Conclusion

  3. Transient dynamic phenomenon Motivation • Pulsatile or relaxed oscillations • Increasing number of time course data • Example: Temporal gene expression pattern How to capture the gene interactions? http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/figure/F2/

  4. How to capture the gene interactions? Identifying significant pattern Considering the interdependence among the time points Considering the independently at each time point

  5. Non-linear dynamics • If two time series interact with each other, there will be a process of leading to rhythmic adjustment resulting from the interaction, leading to phase locking. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/figure/F2/

  6. Overview Correlation among genes Gene expression data across cell cycles Phase locking analysis g0 g1 Combined network g4 g2 g3 Activity measurement Predicted connections of genes in a PPI subnetwork Goal: Maximize the score

  7. Mathematical Formulation

  8. Methods for Analysis of Phase Synchronization • [Gabor, 1946] For an arbitrary continuous real-valued function , its analytical signal is a complex-valued function defined as where • is the Hilbert transform of • symbolizes that the integral is taken in the sense of Cauchy Principal Value. Cauchy principal value: a method for assigning values to certain improper integrals which would otherwise be undefined. [Wikipedia]

  9. Polar form where • is the instantaneous amplitude of • is the instantaneous phase of • Sensitive to low-frequency trend • Use Matlabdetrend function to remove low frequency trends in data Im Re

  10. Phase locking • Two signals with instantaneous phase and • Define cyclic relative phase • Without noise Phase locking (a constant) • With noise, assume distributes around Phase locking Im Re

  11. Phase locking (cont.) • To evaluate the significance of phase locking, use circular mean of the phase difference • In a perfect locking • when is randomly distributed. • Infer potential interaction between gene pairs

  12. Guess the PPI network • Adjacency matrix of genes in a PPI subnetwork • is the circular mean of the phase difference of gene i and gene j respectively

  13. Score the network • For each gene , use EDGE to calculate , the significance of its expression changes during the time course study. (smaller the stronger correlation) • Z-score , where is the inverse normal CDF • TopoPLdefines the overall activity of a subnetwork with http://www.genomine.org/edge/ y = sgn(x)

  14. High level understanding Encourage meaningful links Penalize meaningless links + + dynamic topological property hub genes Adjusted score 

  15. Search Algorithm (simulated annealing) • For i=1 to N • Calculate the current temperature Ti=Ti*0.81/N • Gtry Gout’ • If (Ti<Tend) break; • Randomly pack a node • If () remove from else add to • Calculate the score for the largest connected component of • If •  • Else • Accept with probability

  16. Experiments

  17. Simulation Study • Sample Expression Data gal90R from Cytoscape (http://www.cytoscape.org) • 331 genes and 361 interactions in the network • Randomly selected subnetworks • Size n=40, 60, 80 as condition responsive • Active genes m=80, 90, 100% • Significance values of active genes were assigned randomly with top n×m% significance values in gal80R, Rest assigned the rest of values • Phase locking index λ • RespNet: N(0.8,0.5) • RemNet: N(0.4,0.3) • Based on the distribution of values λ of gene pairs in protein complexes(from MIPS annotation) and of randomly selected gene pairs.

  18. Simulation Study (cont’) • A gene of the predefined responsive subnetworks that is in the TopoPL-identified subnetwork is considered a successful identification. • Repeat 10 times • F-measure • ROC curve

  19. Simulation Study: Results • Similar sensitivity, but TopoPL has higher precision. TAPPA: Topological Analysis of Pathway-Phenotype Association

  20. Simulation Study: Results • TopoPL has the highest AUC. Results are from the simulated data.

  21. Gene expression and protein-protein interaction data • A time course study of yeast cell cycle • EMBL’s Huber group http://www.ebi.ac.uk/huber-srv/scercycle/ • Arrested using alpha factor or cdc28 • Alpha factor dataset: 41 time points • Cdc28 dataset: 44 time points • 5 minutes resolution • Provide strand-specific profiles of temporal expression during the mitotic cell cycle of S. cerevisiae monitored for more than three complete cell divisions [14]

  22. Results • Identifies a subnetwork of 524 genes and 2078 edges with the alpha factor dataset. • Similar results for cdc28 dataset

  23. Most significant “Biological Process” Top 10 GO Biological Processes terms significantly enriched in the subnetwork identified during yeast cell cycle. Use GO term enrichment analysis with topGP package in Bioconductor

  24. Hub genes and High betweenness genes Top 30 genes with highest degrees or betweenness in the identified subnetwork

  25. Core of the identified subnetwork Rectangles denote cell cycle genes Thicker lines indicate higher synchronization

  26. Highly synchronized protein complex protein complex 56 Interaction network of protein complex 56's core components. Top 20 most synchronized interactions (corresponding ~1% of interactions in the identified subnetworks

  27. Distribution of PL index mean λ

  28. Protein complex 56 Expression profiles of genes in the core components of protein complex 56. Left are the expression profiles in the alpha factor experiment, and right are those in the cdc28 experiment.

  29. TF binding motif analysis Does highly synchronized imply the genes are regulated by the same TFs? • Transcription factor binding sites overrepresented in genes of the identified subnetwork and of its core.

  30. Discussion • Limitations • Same frequency

  31. Conclusion • Extract interaction using phase locking analysis • Propose TopoPL scoring method with phase locking analysis • Measure the coherent of the dynamics • Apply Simulated annealing search • Incorporate dynamic data giving robust results • Identify the relevant interaction in the proposed network

  32. Reference • Gabor, D.: Theory of communication. Proc. IEEE Lond. 93, 429 (1946)

  33. Further Reading • http://cognitrn.psych.indiana.edu/busey/erp/Moss_book.pdf • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778057/ • http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6091161 • http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5961631

More Related