1 / 24

Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature

This research paper explores reranking approaches to improve the extraction of protein-protein interactions from biomedical literature. The authors propose a Hidden Vector State (HVS) model and conduct experiments to demonstrate the effectiveness of reranking techniques. The results show a 4% relative improvement in F-measure.

gforbes
Download Presentation

Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering Nanyang Technological University, Singapore 30 August 2007

  2. Outline • Protein-protein interactions (PPIs) extraction • Hidden Vector State (HVS) model for PPIs extraction • Reranking approaches • Experimental results • Conclusions

  3. Interact Protein Protein Protein Protein-Protein Interactions Extraction Spc97p interacts with Spc98 and Tub4 in the two-hybrid system Spc97p interact Spc98 Spc97p interact Tub4

  4. Existing Approaches Parsing-Based Pattern Matching Statistics Methods Simple to Complicated

  5. An example

  6. Statistics-Based Approaches

  7. Pattern Matching Approaches

  8. Parsing-Based Approaches

  9. semantic model lexical model Semantic Parser For each candidate word string Wn, need to compute most likely set of embedded concepts Ĉ = argmax { P(C|Wn) } = argmax { P(C) P(Wn|C) } c c

  10. P(C) P(Wn|C) We could use a simple finite state tagger … … can be robustly trained using EM, but model is too weak to represent embeddings in natural language

  11. P(C) P(Wn|C) Perhaps use some form of hierarchical HMM in which each state is a terminal or a nested HMM … … but when using EM, models rarely converge on good solutions and, in practice, direct maximum-likelihood from “tree-bank” data are needed to train models

  12. Hidden Vector State Model

  13. P(C) P(Wn|C) The HVS model is an HMM in which the states correspond to the stack of a push-down automata with a bounded stack size … … this is a very convenient framework for applying constraints

  14. HVS model transition constraints: • finite stack depth – D • push only one non-terminal semantic onto the stack at each step Ĉ = argmax { ∏P(nt|Ct-1) P(Ct[1]|Ct [2..Dt]) P(Wt|Ct) } c,Nt … model defined by three simple probability tables

  15. 1) POP 1 elements from the previous stack state, n =1 2) Push 1 pre-terminal semantic concept into stack P(nt|Ct-1) P(Ct[1]|Ct [2..Dt]) 3) Generate the next word P(Wt|Ct) Parsing with the HVS model INTERACT PROTEIN SS DUMMY INERACT PROTEIN SS PROTEIN INTERACT PROTEIN SS … with Spc98 and Tub4 …

  16. Constraints Data EM Parameter Estimation HVS Model Parameters Parse Statistics Train using EM and apply constraints Training text Abstract semantic annotation PROTEIN ( INTERACT ( PROTEIN) ) CUL-1 was found to interact with SKR-1, SKR-2, SKR-3, and SKR-7 in yeast two-hybrid system Limit forward-backward search to only include states which are consistent with the constraints

  17. Reranking Methodology • Reranking approaches attempts to improve upon an existing probabilistic parser by reranking the output of the parser. • It has benefited applications such as name-entity extraction, semantic parsing and semantic labeling. • To rerank parses generated by the HVS model for protein-protein interactions extraction

  18. Architecture

  19. Reranking approaches • Features for Reranking Suppose sentence Si has its corresponding parse set Ci = {Cij, j = 1,.. N} • Parsing Information • Structure Information • Complexity Information

  20. Reranking approaches Score is defined as • log-linear regression model • Neural Network • Support Vector Machines

  21. Experiments • Setup • Corpus I • comprises of 300 abstracts randomly retrieved from the GENIA corpus • GENIA is a collection of research abstracts selected from the search results of MEDLINE database with keyword (MeSH terms) “human, blood cells and transcription factors” • split into two parts: • Part I contains 1500 sentences (training data) • Part II consists of 1000 sentences (test data)

  22. Experimental Results Figure 1: F-measure vs number of candidate parses.

  23. Experimental Results (cont’d) Table 3: Results based on the interaction category.

  24. Conclusions • Three reranking methods for the HVS model in the application of extracting protein-protein interactions from biomedical literature. • Experimental results show that 4% relative improvement in F-measure can be obtained through reranking on the semantic parse results • Incorporating other semantic or syntactic information might be able to give further gains.

More Related