1 / 36

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Inferring strengths of protein-protein interactions from experimental data using linear programming. Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University. Overview. Background Probabilistic model Related work Biological experimental data

salali
Download Presentation

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

  2. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  3. Background (1/3) • Understanding protein-protein interactions is useful for understanding of protein functions. • Transcription factors • Proteins interact with a factor. • Regulate the gene. • Receptors, etc.

  4. Background (2/3) • Various methods were developed for inference of protein-protein interactions • Gene fusion/Rosetta stone (Enright et al. and Marcotte et al. 1999) • Number of possible genes to be applied is limited. • Molecular dynamics • Long CPU time • Difficult to predict precisely

  5. Background (3/3) • A Model based on domain-domain interactions hasbeen proposed. • Use domains defined by databases like InterPro or Pfam. Domain Domain

  6. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  7. Probabilistic model of interaction (1/2) • Model (Deng et al., 2002) • Two proteins interact. At least one pair of domains interacts. • Interactions between domains are independent events. D3 D1 P1 P2 D2 D2 D4

  8. Probabilistic model of interaction (2/2) • : Proteins Pi and Pj interact • : Domains Dm and Dn interact • : Domain pair (Dm ,Dn) is included in protein pair PiXPj

  9. Overview • Background • Probabilistic model • Related work • Association method (Sprinzak et al., 2001) • EM method (Deng et al., 2002) • Biological experimental data • Proposed methods • Results of computational experiments • Conclusion

  10. Related work • INPUT: • interacting protein pairs (positive examples) • non-interacting protein pairs (negative examples) • OUTPUT: Pr(Dmn=1) for all domain pairs

  11. Association method (Sprinzak et al., 2001) • Inference of probabilities of domain-domain interactions using ratios of frequencies • : Number of interacting protein pairs that include (Dm, Dn) • : Number of protein pairs that include (Dm, Dn)

  12. EM method (Deng et al.,2002) • Probability (likelihood L) that experimental data {Oij={0,1}} are observed. • Use EM algorithm in order to (locally) maximize L. • Estimate Pr(Dmn=1)

  13. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  14. Biological experimental data • Related methods (Association and EM) use only binary data (interact or not). • Experimental data using Yeast 2 hybrid • Ito et al. (2000, 2001) • Uetz et al. (2001) • For many protein pairs, different results (Oij= {0,1}) were observed. • We developed new methods using raw numerical data.

  15. Numerical data • Ito et al. (2000,2001) • For each protein pair, experiments were performed multiple times. • IST (Interaction Sequence Tag) • Number of observed interactions • By using a threshold, we obtain binary data.

  16. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  17. It seems difficult to modify EM method for numerical data. Linear Programming For binary data LPBN Combined methods LPEM EMLP SVM-based method For numerical data ASNM LPNM Proposed methods

  18. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  19. LPBN (LP-based method)(1/2) • Transformation into linear inequalities • PiandPjinteract

  20. LPBN (LP-based method)(2/2) • Linear programming for inference of protein-protein interactions

  21. Combination of EM and LPBN • LPEM method • Use the results of LPBN as initial parameter values for EM. • EMLP method • Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.

  22. Simple SVM-based method • Feature vector • Simple linear kernel with • Interacting pairs = Positive examples • Non-interacting pairs = Negative examples

  23. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

  24. Strength of protein-protein interaction • For each protein pair, experiments were performed multiple times. • The ratio can be considered as strength. • Kij : Number of observed interactions for a protein pair (Pi,Pj) • Mij : Number of experiments for (Pi,Pj)

  25. LPNM method (1/2) • Minimize the gap between Pr(Pij=1) and using LP.

  26. LPNM method (2/2) • Linear programming for inference of strengths of protein-protein interactions

  27. ASNM • Modified Association method for numerical data • For binary data (Sprinzak et al., 2001)

  28. Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposedmethods • For binary data • For numerical data • Results of computational experiments • Conclusion

  29. Computational experimentsfor binary data • DIP database (Xenarios et al., 2002) • 1767 protein pairs as positive • 2/3 of the pairs for training, 1/3 for test • Computational environment • Xeon processor 2.8 GHz • LP solver: loqo

  30. Results on training data (binary data) EM Association LPBN SVM

  31. Results on test data (binary data) EM EMLP LPEM SVM Association

  32. Computational experimentsfor numerical data • YIP database (Ito et al., 2001, 2002) • IST (Interaction Sequence Tag) • 1586 protein pairs • 4/5 for training, 1/5 for test • Computational environment • Xeon processor 2.8 GHz • LP solver: lp_solve

  33. Results on test data (numerical data) ASNM LPNM EM Association

  34. Results on test data (numerical data) • LPNM is the best. • EM and Association methods classify Pr(Pij=1) into either 0 or 1.

  35. Conclusion • We have defined a new problem to infer strengths of protein-protein interactions. • We have proposed LP-based methods. • For binary data • LPBN, LPEM, EMLP • SVM-based method • For numerical data • ASNM • LPNM • LPNM outperformed the other methods.

  36. Future work • Improve the methods to avoid overfitting. • Improve the probabilistic model to understand protein-protein interactions more accurately.

More Related