200 likes | 380 Views
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Authors: Nabieva, et al. Source: Bioinformatics 2005 Reviewed by BH Shen. Goal of the Research Problem.
E N D
Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps Authors: Nabieva, et al. Source: Bioinformatics 2005 Reviewed by BH Shen
Goal of the Research Problem • To predict the functions of proteins of unknown function from a given set of protein interaction networks and some annotated proteins.
Problem instance • A protein interaction network • Node: represent a protein, which is either annotated or not annotated • A annotated node is labeled with a particular function. • Undirected Edge: indication for an interaction between the two endpoints. Such data are from experimental or computational results.
Assumptions • Postulation: Protein interaction networks provide hints to the higher-level organization of the cell. • The closer a protein to another in the network the more they share the same function. • Interaction networks can be partitioned into functional modules.
Prior work • Grouping not based on functional annotations • Share interactions (Brun et al., 2003; Schlitt et al., 2003; Strong et al., 2003; von Mering et al., 2003b; Lee et al., 2004) • Shortest path vectors(Rives and Galitski, 2003) • Clique-based (Spirin and Mirny, 2003)
Prior work • Neighboring interactions: 3 most freqent annotations for an unknown protein with mahority votes (Schwikowski et al., 2000) • Disavantages: limited use of the underlying graph structure.
Prior work • Neighborhood: extend the neighboring method to nodes within certain radius (Hishigaki et al., 2001). • Not considering the network topology within the neigborhood.
Prior work • Generalized Multiway K-cut (Vazquez et al., 2003; Karaoz et al., 2004) • Minimize the number of different annotations associated with neighboring proteins in a group. • A more general version of multiway k-cut problem. • To assign a unique function to all the unannotated nodes to minimize the sum of the costs of the edges joining node with no function in common. • Disadvantages: not reward local proximity
Functional Flow – Nodes / Proteins • Uses the idea of network flow. • Each protein of known functional annotations is treated as a ‘flow source’. • The amount of flow at a source is infinite. • However, there are no sinks as the usual network flow problem.
Functional Flow – Edges / Interactions • The flow at a source propagate to neighboring unannotated nodes within a predetermined number of steps. • Update of the flow to neighbors of a node at each iteration. • Each Edge has a ‘capacity’, incorporated a distance effect. • Multiple paths between two proteins result in more flow.
Functional Flow – Edge reliability as the weight/capacity • Integrate multiple sources of experimental and computational results (5 in this research). • The Reliability ri of an interaction from a source i • fraction of the interaction connect proteins with a known shared function. • Combining the reliability from all sources
Functional Flow – Objective functions • The score of a function for an unannotated node is the total amount of flow entering the node. • The amount of flow leaving the node is irrelevant. • The locality effect is similar in some ways to the locally constrained diffusion kernel, but the flow in this proposed method is limited by capacities on edges.
Functional Flow – Rule for flow • Initial flow of function a at a node u at time 0. • The flow of function a at node u at time t.
Functional Flow – flow propagation and score • The flow of function a on edge from u to v at time t. • The functional score of a for node u (the total amount of flow enters the node)
Metrics for Experimental Results • N-fold cross-validation • 2-, 3-, 5-, and 10- fold cases were tested. • Performance of an algorithm evaluated by whether the top scoring prediction above some threshold is a known functional annotation (true positive, TP) or not (false positive, FP). • For multiple prediction (tricky situation),count a protein’s prediction as a TP if more than half of the predictions made for it are correct and as a FP otherwise.
Conclusion • The proposed algorithm utilizes indirect network interactions, network topology, network distances and edges weighted by reliability estimated from multiple data sources. • The simplest methods, such as Majority, perform well if there are enough direct neighbors with known function. • Only simple reliability estimation were used. • The proposed algorithm only applied to baker’s yeast in this research, but it is likely useful when analyzing less characterized proteomes.