370 likes | 495 Views
Andrej Bugrim GeneGo, Inc. Protein scoring based on significance in biological networks. Two problems of systems biology. How to reconstruct condition-specific networks in biologically robust way How to utilize reconstructed networks in day-to-day laboratory practice
E N D
Andrej Bugrim GeneGo, Inc. Protein scoring based on significance in biological networks
Two problems of systems biology • How to reconstruct condition-specific networks in biologically robust way • How to utilize reconstructed networks in day-to-day laboratory practice Still need to answer questions centered on individual genes/proteins: • Which genes are most important for a condition/disease? • What are the best drug targets? • What are the most robust biomarkers?
Sources of the problems • Biological networks are very interconnected due to presence of hubs. Hubs almost always provide “shortest path” connectivity • Multiple paths can be generated to connect a pair of nodes - no way to discriminate between alternative hypothesis • Resulting networks are often large and biologically intractable. It is hard to understand roles of individual nodes
Some earlier solutions • Use “canonical pathways” as basis for reconstruction • Limited to known pathways • Penalize hubs when reconstructing networks • Does not discriminate between individual hubs
Our solution Find nodes that are significant in providing connectivity in condition-specific dataset
A Topologically significant Not topologically significant Finding topologically significant nodes B C 4 out 6 under nodes regulated by B are differentially expressed: more than random share = significant Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event = not significant In reality algorithm also considers nodes beyond first-degree neighbors Differentially expressed genes Non-differentially expressed genes
Why JAK1 is significant in this dataset? Regulation via JAK1 Feedback loops • JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1 • Topological significance helps to find important links in pathways that do not come up on HT screens
Node scoring algorithm • Let K be a set of experimentally-derived nodes of interest (e.g. nodes representing differentially expressed genes). K is the subset of the global network of size N. • Calculate shortest path network S by building directed paths from each node in K to other nodes in K, wherever possible. S is a subset of N and may contain nodes in addition to K. Also some nodes from K may become part of S • Lets consider node i є (S) and one of the nodes of the experimental set j є K. • Calculate the shortest path networks between j and every other node in the global network (N-1 pairs) and count how many of them contain i. This number is Nij < N-1. • Calculate the shortest path networks between j and all other nodes in the experimental set and count how many of them contain node i This number Kij < K-1. • The probability that node i would be present Kij-times or more in the shortest path networks of i by chance follows a hyper-geometric distribution: • Repeat the procedure for all nodes (j) in the subset, calculating Kp-values for node i (pij), each of these values showing relevance of node i to individual members of the set K. As we want to identify the nodes which are statistically significant to at least one or more members of the experimental set we define the p-value associated with node i as the minimum of the pij values.
Algorithm validation: PSORIASIS • Psoriasis is recognized as the most common T cell-mediated inflammatory disease in humans. • Genetic linkage to as many as six distinct disease loci has been established but the molecular etiology and genetics remain unknown. • To begin to identify psoriasis disease-related genes and construct in vivo pathways of the implicated processes, genome-wide expression screens of psoriasis patients need to be undertaken • The disease-related gene map may provide new insights into the pathogenesis of psoriasis
Data • 4 samples from 4 psoriasis patients were taken at 2 different times • At the time of developed psoriatic lesion (P) • And at the time of its complete healing (N) • The samples were taken from the same exact spot on the same patient, which eliminates a great deal of experimental bias and uncertainty. • Affymetrix Human U95A microarray technology was then utilized to evaluate the expression data • Only the differentially expressed genes between the sample from the lesion (P) and the from the normal (N) were then used for comprehensive analysis with new algorithm and in MetaCore 4.0
Algorithm validation • As “experimental set” we use 266 differentially expressed genes identified in the paper • The shortest path network connecting these genes is built using global network of protein interactions from MetaCore™. Statistical significance of each node in this network is calculated as described above • To evaluate whether the nodes deemed significant by our method are indeed likely to be disease-related we perform automated search of PubMed abstracts for co-occurrence of corresponding gene name and word “psoriasis” for every gene in the shortest path network. Different statistical measures are plotted as function of node’s p-value • Functional analysis of high-scored genes is performed in MetaCore™
Fraction of genes related to “psoriasis” scales with significance
VEGF – key pathway identified! Simonetti O, Lucarini G, Goteri G, Zizzi A, Biagini G, Lo Muzio L, Offidani A. VEGF is likely a key factor in the link between inflammation and angiogenesis in psoriasis: results of an immunohistochemical study. Int J Immunopathol Pharmacol. 2006 October-December;19(4):751-760
Conclusions from algorithm validation • High-scored nodes are significantly enriched in disease-related genes • Important disease-related pathways are identified • Important drug targets are highly scoed
Integration of genomic and proteomic sets • LNCap prostate cell lines • Treated with Androgen • Untreated - control • Data: • Proteomic data - ~ 70 proteins exclusively present in treated cells • Gene Expression profiling of Androgen-treated cells • Analysis • Topological analysis of Androgen-specific protein network • Correlation between topologically significant nodes and gene expression • Functional analysis in MetaCore™ • Network analysis in MetaCore™
Revealing regulation of LNCaP cells response to Androgen Topologically significant nodes reveal regulation Gene Expression and Proteomic data reveal target pathways by differentially expressed genes by Androgen-specific proteins by topologically significant node
Correlation between expression and significance Among topologically significant genes the fraction of differentially expressed genes is high P-value related to differential expression P-value related to topological significance
Androgen receptor signaling 1- Differentially expressed gene 2 – Androgen-specific protein 3- Topologically significant node
Regulation of lipid Metabolism Topologically significant nodes revealed by the new algorithm Differentially expressed genes identified by microarray and confirmed by proteomic screen
Possible regulation of PBEF by AR PBEF occurs in both, expression and proteomic datasets – possibly activated by androgen receptor via HIF1 or HNF4
Conclusions • Presented method allows assigning priority to nodes in biological networks built on condition-specific datasets • The presented method is able to predominantly select genes with high relevance to condition of interest • The presented method could be used for cross-validation of different datatypes, identification of novel drug targets and validation of existing targets
Putting it all together: network activity inference • Identifying causal relation between putative input and output signals • Tracking effects of molecular perturbation trough activation/inhibition cascades Predicted input Scoring intermediary nodes Experimental data Experimental data: terminate cascade Predicted target Experimental data: start cascade Inferred activity
Acknowledgements GeneGo Zoltan Dezso Yuri Nikolsky Tatiana Nikolskaya University of Michigan Adaikkalam Vellaichamy Saravana M Dhanasekaran Arun Sreekumar Arul Chinnaiyan Gilbert Omenn