1 / 59

Lecture 4 Protein Function prediction using network concepts

Lecture 4 Protein Function prediction using network concepts Application of network concepts in DNA sequencing. Topology of Protein-protein interaction is informative but further analysis can reveal other information.

woody
Download Presentation

Lecture 4 Protein Function prediction using network concepts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 4 • Protein Function prediction using network concepts • Application of network concepts in DNA sequencing

  2. Topology of Protein-protein interaction is informative but further analysis can reveal other information. A popular assumption, which is true in many cases is that similar function proteins interact with each other. Based on these assumption, we have developed methods to predict protein functions and protein complexes from the PPI networks mainly based on cluster analysis.

  3. Cluster Analysis Cluster Analysis, also called data segmentation, implies grouping or segmenting a collection of objects into subsets or "clusters", such that those within each cluster are more closely related to one another than objects assigned to different clusters. In the context of a graph densely connected nodes are considered as clusters Visually we can detect two clusters in this graph

  4. K-cores of Protein-Protein Interaction Networks Definition Let, a graph G=(V, E) consists of a finite set of nodes V and a finite set of edges E. A subgraph S=(V, E) where V V and E E is a k-core or a core of order k of G if and only if  v  V: deg(v)  k within S and S is the maximal subgraph of this property.

  5. Graph G 1-core graph: The degree of all nodes are one or more

  6. 1-core graph: The degree of all nodes are one or more

  7. 2-core graph: The degree of all nodes are two or more

  8. 1-core graph: The degree of all nodes are one or more

  9. Graph G 3-core graph: The degree of all nodes are three or more The 3-core is the highest k-core subgraph of the graph G

  10. Analyzing protein-protein interaction data obtained from different sources, G. D. Bader and C.W.V. Hogue, Nature biotechnology, Vol 20, 2002

  11. Prediction of Protein Functions Based on K-cores of Protein-Protein Interaction Networks “Prediction of Protein Functions Based on K-cores of Protein-Protein Interaction Networks and Amino Acid Sequences”, Md. Altaf-Ul-Amin, Kensaku Nishikata, Toshihiro Koma, Teppei Miyasato, Yoko Shinbo, Md. Arifuzzaman, Chieko Wada, Maki Maeda, Taku Oshima, Hirotada Mori, Shigehiko Kanaya The 14th International Conference on Genome Informatics December 14-17, 2003, Yokohama Japan.

  12. Total 3007 proteins and 11531 interactions Around 2000 are unknown function proteins Highest K-core of this total graph is not so helpful

  13. 10-core graph

  14. We separate 1072 interactions (out of 11531) involving protein synthesis and function unknown proteins. P. S. U. F. P. S. P. S.

  15. Function unknown Proteins of this 6-kore graph are likely to be involved in protein synthesis Unknown

  16. 193 interactions out of 11531 interactions involving electron transport and function unknown proteins.

  17. Function unknown Proteins of this 2-kore graph are likely to be involved in electron transfer Further sub-classification may be possible applying other information with the k-core subgraph Highest k-core or the 2-core subgraph of the graph of the previous page

  18. “Prediction of Protein Functions Based on Protein-Protein Interaction Networks: A Min-Cut Approach”, Md. Altaf-Ul-Amin, Toshihiro Koma, Ken Kurokawa, Shigehiko Kanaya, Proceedings of the Workshop on Biomedical Data Engineering (BMDE), Tokyo, Japan, pp. 37-43, April 3-4, 2005.

  19. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  20. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  21. Introduction After the complete sequencing of several genomes, the challenging problem now is to determine the functions of proteins • Determining protein functions experimentally • Using various computational methods a) sequence b) structure c) gene neighborhood d) gene fusions e) cellular localization f) protein-protein interactions

  22. Introduction Present work predicts protein functions based on protein-protein interaction network. • For the purpose of prediction, we consider the interactions of • function-unknown proteins with function-known proteins and • function-unknown proteins with function-unknown proteins In the context of the whole network.

  23. Introduction Schwikowski, B., Uetz, P. and Fields, S. A network of protein-protein interactions in yeast. Nature Biotech. 18, 1257-1261 (2000) Deals with a network of 2039 proteins and 2709 interactions. 65% of interactions occurred between protein pairs with at least one common function Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., and Tagaki, T. Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18, 523-531 (2001) Reported similar results..

  24. Introduction So, majority of protein-protein interactions are between similar function protein pairs. Therefore, We assign function-unknown proteins to different functional groups in such a way so that the number of inter-group interactions becomes the minimum. Hence we call the proposed approach a Min-Cut approach.

  25. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  26. The concept of Min-Cut U4 K8 U3 K1 K4 U2 K6 K2 K3 U1 K5 G1 G2 A typical and small network of known and unknown proteins

  27. The concept of Min-Cut U4 K U3 K K U2 K K K U1 K G1 G2 Unknown proteins assigned to known groups based on majority interactions

  28. The concept of Min-Cut U4 K U3 K K U2 K K K U1 K G1 G2 Number of CUT = 4

  29. The concept of Min-Cut U4 K U3 K K U2 K K K U1 K G1 G2 An alternative assignment of unknown proteins

  30. The concept of Min-Cut U4 K U3 K K U2 K K K U1 K G1 Number of CUT = 2 G2 For every assignment of unknown proteins, there is a value of CUT. Min-cut approach looks for an assignment for which the number of CUT is minimum.

  31. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  32. Problem Formulation Here we explain some points with a typical example.

  33. Problem Formulation V= set of all nodes E =set of all edges G={K1, K2, K3, K4, K5, K6, K7, K8, K9, K10} U={U1, U2, U3, U4, U5, U6, U7, U8}

  34. Problem Formulation We generate U´ U such that each protein of U´ is connected in N with at least one protein of group G by a path of length 1 or length 2. U´= {U1, U2, U3, U4, U5, U6, U7}

  35. Problem Formulation We can assign proteins of U´ to different groups and calculate CUT Interactions between known protein pairs can never be part of CUT For this assignment of unknown proteins, the CUT= 6

  36. Problem Formulation The problem we are trying to solve is to assign the proteins of set U´ to known groups G1 , G2 ,…….., G3 in such a way so that the CUT becomes the minimum.

  37. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  38. A Heuristic Method • The problem under hand is a variant of network partitioning problem. • It is known that network partitioning problems are NP-hard. • Therefore, we resort to some heuristics to find a solution as better as it is possible.

  39. A Heuristic Method

  40. A Heuristic Method U1 has one path of length 1 with G2 and two paths of length two with G1

  41. A Heuristic Method U4 has two paths of length 1 with G1, one path of length one with G2 and one path of length two with G3.

  42. A Heuristic Method

  43. A Heuristic Method

  44. A Heuristic Method By assigning all the unknown proteins to respective height priority groups, CUT = 6

  45. A Heuristic Method For this assignment of unknown proteins, the CUT= 7

  46. A Heuristic Method For this assignment of unknown proteins, the CUT= 4

  47. Outline • Introduction • The concept of Min-Cut • Problem Formulation • A Heuristic Method • Evaluation of the Proposed Method • Conclusions

  48. Evaluation of the Proposed Approach • The proposed method is a general one and can be applied to any organism and any type of functional classification. • Here we applied it to yeast Saccharomyces cerevisiae protein-protein interaction network • We obtain the protein-protein interaction data from ftp://ftpmips.gsf.de/yeast/PPI/ which contains 15613 genetic and physical interactions.

  49. Evaluation of the Proposed Approach YAR019c YMR001c YAR019c YNL098c YAR019c YOR101w YAR019c YPR111w YAR027w YAR030c YAR027w YBR135w YAR031w YBR217w ------------- ------------- ------------- ------------- Total 12487 pairs We discard self-interactions and extract a set of 12487 unique binary interactions involving 4648 proteins.

More Related