330 likes | 461 Views
Identification of Highly Synchronized Subnetworks from Gene Expression Data. Shouguo Gao , Xujing Wang From 8 th International Symposium on Bioinformatics Research and Applications (ISBRA’12) BMC Bioinformatics 2013, 14( Supp 9):S5. Presented by Pak Kan , WONG. Contents. Motivation
E N D
Identification of Highly Synchronized Subnetworks from Gene Expression Data ShouguoGao, Xujing Wang From 8th International Symposium on Bioinformatics Research and Applications (ISBRA’12)BMC Bioinformatics 2013, 14(Supp 9):S5 Presented by Pak Kan, WONG
Contents • Motivation • Overview • Mathematical formulation • Experiments • Simulated study • Study on yeast data • Conclusion
Transient dynamic phenomenon Motivation • Pulsatile or relaxed oscillations • Increasing number of time course data • Example: Temporal gene expression pattern How to capture the gene interactions? http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/ http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/figure/F2/
How to capture the gene interactions? Identifying significant pattern Considering the interdependence among the time points Considering the independently at each time point
Non-linear dynamics • If two time series interact with each other, there will be a process of leading to rhythmic adjustment resulting from the interaction, leading to phase locking. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2553322/figure/F2/
Overview Correlation among genes Gene expression data across cell cycles Phase locking analysis g0 g1 Combined network g4 g2 g3 Activity measurement Predicted connections of genes in a PPI subnetwork Goal: Maximize the score
Methods for Analysis of Phase Synchronization • [Gabor, 1946] For an arbitrary continuous real-valued function , its analytical signal is a complex-valued function defined as where • is the Hilbert transform of • symbolizes that the integral is taken in the sense of Cauchy Principal Value. Cauchy principal value: a method for assigning values to certain improper integrals which would otherwise be undefined. [Wikipedia]
Polar form where • is the instantaneous amplitude of • is the instantaneous phase of • Sensitive to low-frequency trend • Use Matlabdetrend function to remove low frequency trends in data Im Re
Phase locking • Two signals with instantaneous phase and • Define cyclic relative phase • Without noise Phase locking (a constant) • With noise, assume distributes around Phase locking Im Re
Phase locking (cont.) • To evaluate the significance of phase locking, use circular mean of the phase difference • In a perfect locking • when is randomly distributed. • Infer potential interaction between gene pairs
Guess the PPI network • Adjacency matrix of genes in a PPI subnetwork • is the circular mean of the phase difference of gene i and gene j respectively
Score the network • For each gene , use EDGE to calculate , the significance of its expression changes during the time course study. (smaller the stronger correlation) • Z-score , where is the inverse normal CDF • TopoPLdefines the overall activity of a subnetwork with http://www.genomine.org/edge/ y = sgn(x)
High level understanding Encourage meaningful links Penalize meaningless links + + dynamic topological property hub genes Adjusted score
Search Algorithm (simulated annealing) • For i=1 to N • Calculate the current temperature Ti=Ti*0.81/N • Gtry Gout’ • If (Ti<Tend) break; • Randomly pack a node • If () remove from else add to • Calculate the score for the largest connected component of • If • • Else • Accept with probability
Simulation Study • Sample Expression Data gal90R from Cytoscape (http://www.cytoscape.org) • 331 genes and 361 interactions in the network • Randomly selected subnetworks • Size n=40, 60, 80 as condition responsive • Active genes m=80, 90, 100% • Significance values of active genes were assigned randomly with top n×m% significance values in gal80R, Rest assigned the rest of values • Phase locking index λ • RespNet: N(0.8,0.5) • RemNet: N(0.4,0.3) • Based on the distribution of values λ of gene pairs in protein complexes(from MIPS annotation) and of randomly selected gene pairs.
Simulation Study (cont’) • A gene of the predefined responsive subnetworks that is in the TopoPL-identified subnetwork is considered a successful identification. • Repeat 10 times • F-measure • ROC curve
Simulation Study: Results • Similar sensitivity, but TopoPL has higher precision. TAPPA: Topological Analysis of Pathway-Phenotype Association
Simulation Study: Results • TopoPL has the highest AUC. Results are from the simulated data.
Gene expression and protein-protein interaction data • A time course study of yeast cell cycle • EMBL’s Huber group http://www.ebi.ac.uk/huber-srv/scercycle/ • Arrested using alpha factor or cdc28 • Alpha factor dataset: 41 time points • Cdc28 dataset: 44 time points • 5 minutes resolution • Provide strand-specific profiles of temporal expression during the mitotic cell cycle of S. cerevisiae monitored for more than three complete cell divisions [14]
Results • Identifies a subnetwork of 524 genes and 2078 edges with the alpha factor dataset. • Similar results for cdc28 dataset
Most significant “Biological Process” Top 10 GO Biological Processes terms significantly enriched in the subnetwork identified during yeast cell cycle. Use GO term enrichment analysis with topGP package in Bioconductor
Hub genes and High betweenness genes Top 30 genes with highest degrees or betweenness in the identified subnetwork
Core of the identified subnetwork Rectangles denote cell cycle genes Thicker lines indicate higher synchronization
Highly synchronized protein complex protein complex 56 Interaction network of protein complex 56's core components. Top 20 most synchronized interactions (corresponding ~1% of interactions in the identified subnetworks
Distribution of PL index mean λ
Protein complex 56 Expression profiles of genes in the core components of protein complex 56. Left are the expression profiles in the alpha factor experiment, and right are those in the cdc28 experiment.
TF binding motif analysis Does highly synchronized imply the genes are regulated by the same TFs? • Transcription factor binding sites overrepresented in genes of the identified subnetwork and of its core.
Discussion • Limitations • Same frequency
Conclusion • Extract interaction using phase locking analysis • Propose TopoPL scoring method with phase locking analysis • Measure the coherent of the dynamics • Apply Simulated annealing search • Incorporate dynamic data giving robust results • Identify the relevant interaction in the proposed network
Reference • Gabor, D.: Theory of communication. Proc. IEEE Lond. 93, 429 (1946)
Further Reading • http://cognitrn.psych.indiana.edu/busey/erp/Moss_book.pdf • http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778057/ • http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=6091161 • http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5961631