1 / 22

Automated Gene Summary: Let the Computer Summarize the Knowledge

Automated Gene Summary: Let the Computer Summarize the Knowledge. Xu Ling Department of Computer Science University of Illinois at Urbana-Champaign. The Reality of Scientific Literature. Hard to keep up manual curation!. Automated Gene Summarization. Gene summary. . . . . . . . . .

sasson
Download Presentation

Automated Gene Summary: Let the Computer Summarize the Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Gene Summary:Let the Computer Summarize the Knowledge Xu Ling Department of Computer Science University of Illinois at Urbana-Champaign

  2. The Reality of Scientific Literature Hard to keep up manual curation!

  3. Automated Gene Summarization Gene summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene product Expression Sequence Interactions Mutations General Functions

  4. Goal • To retrieve and summarize all the knowledge about a particular gene from the literature • Compressing knowledge: enables biologists to quickly understand the target gene. • Automated curation: explicitly covers multiple aspects of a gene, such as the sequence information, mutant phenotypes etc.

  5. Semi-structured summary on multiple aspects Gene products Expression pattern Sequence information Phenotypic information Genetical/physical interactions … 2-stage summarization Retrieve relevant articles by gene name search Extract most informative and relevant sentences for each aspects. Our Solution

  6. Text Summary of Gene Abl

  7. System Overview: 2-stage Gene name recognition Sentence Categorization

  8. Gene Name Recognition • v1: Dictionary-based string match • High recall, low precision • v2: Machine learning methods of gene name recognition • High precision, low recall • v3: v2 + dictionary based synonym expansion • Improved in both recall and precision

  9. Categorization of Retrieved Sentences • Collect “example sentences” from FlyBase • v1: applying vector space model to construct aspect “profile”. • v2: applying probabilistic models to factor out context-specific language. • v3: v2 + biologist labeled training examples. Real sentence! Many thanks for the help by Susan Brown’s “Beetle group” !

  10. Example. 1

  11. Example. 2

  12. Gene Summary in BeeSpace v4 • To add

  13. General Entity Summarization • General and applicable to summarize other entities: pathways, protein family, … • General settings: • Space: A set of documents to be summarized. • Aspects: A set of aspects to define the structure of the summary. • Examples: Training sentences for each aspect.

  14. Further Generalization … • Limitations of the categorization problem with training examples • Predefined aspects, may not fit the need of a particular user • Only works for a predefined domain and topics • Training examples for each aspect are often unavailable • More Realistic New Setup • Allow a user to flexibly describe each facet with keywords (1-2): let the user determine what they want • Generate the summary in a semi-supervised way: no need of training examples

  15. Example (1): Consumer vs. Editor Honda accord 2006

  16. Example (2): Different Aspects 17 • What if the users want an overview with different facets?

  17. Conclusion • The generated summaries are • directly useful to biologists, • and also serve as entry points to enable them to quickly navigate relevant literatures, • via the BeeSpace analysis environment available at www.beespace.uiuc.edu

  18. Start from Here … • The reverse of automated entity summarization: automated entity retrieval • Profiling of entities using entity summary Eg.,what genes are associated with … ? • Build a powerful knowledge base … • Enriched entities under certain context Eg.,what are the significantly enriched genes in …? • Entities involved in certain biomedical relations Eg.,what genes are interacting with gene X ? BeeSpace v5 !

  19. Acknowledgement Bruce Schatz Gene Robinson Chengxiang Zhai Xin He Jing Jiang Qiaozhu Mei Moushumi Sarma

  20. Vector Space Model (VSM) • Construct a corresponding term vector Vc using the training sentences for the aspect • The weight of a term ti in the aspect term vector for aspect j: wij=TFijIDFi, where TFij= term frequency, IDFi= 1 + log(N/ni) is the inverse document frequency (N=total number of documents, ni=number of documents containing term ti). • Construct a sentence term vector Vs for each sentence • with the same IDF and TF=number of times a term occurs in the sentence • Aspect relevance score S=cos(Vc, Vs).

More Related