1 / 17

An Incremental Approach to MEDLINE MeSH Indexing

An Incremental Approach to MEDLINE MeSH Indexing. Presenter: Hongfang Liu. Team Member: Mayo Clinic: Wu Stephen, James Masanz , and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette. BioASQ 2013. Outline. Motivation & Task Incremental Systems MetaMap -based

avedis
Download Presentation

An Incremental Approach to MEDLINE MeSH Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette BioASQ2013

  2. Outline • Motivation & Task • Incremental Systems • MetaMap-based • Search-based • LLDA-based • Experiment Setup • Evaluation • Conclusion

  3. Motivation of BioASQTask • Reduce human effort in MeSH indexing • Increasing number of new articles • Low consistency among annotators [Funk and Reid] • Automatic MeSHindexing • Suggest MeSH terms for a given new article

  4. Motivation of Mayo’s Participation • Information retrieval (IR)-based ontology annotation • Traditional approach has been information extraction-based • Three levels of intelligence in artificial intelligence • Knowledge-base intelligence • Data intelligence • User intelligence > Explore the use of topic modeling and distant supervision for ontology annotation

  5. Proposed Approaches DUI DUI DUI • MetaMap-based • Search-based • LLDA-based Three approaches can work either independently or together in an incremental way DUI

  6. MetaMap-based System MetaMap Restricted to MeSH ontology Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort … Title_score Score threshold A ranked list of CUI => a ranked list of DUI Top DUI

  7. MetaMap-based System • Parameter Tuning Title weight Score threshold Top DUI Titles concepts are more important Low threshold roughly leads to high precision/recall Tradeoff between P/R

  8. Search-based System • Retrieval Model • DUI Aggregation – query term – query weight – matching function – document – Dirichlet parameter D01, D02, D03 … Docs DUI D08, D03, D01 … ranked by tf * score(Q, D) D02, D03, D01 …

  9. Search-based System • Term Query • is a single-word expression • concept-related words in title and abstract • Phrase Query • is a multi-word expression • concept-related phrases in title and abstract • Long Query • mix of TQ and PQ #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)

  10. Search-based System • Parameter Tuning Dirichlet Smoothing parameter Top-ranked documents Top-ranked DUI Less smoothing => better performance A small set of highly relevant documents Tradeoff between P/R

  11. Systems • LLDA-based • LDA Process • Each document is a mixture of topics • Each topic is a multinomial word distribution • Labeled LDA • Incorporate label information

  12. Systems • LLDA-based • Top categories in MeSH root Top-level categories as topics (e.g., Anatomy Category, Chemicals and Drugs Category, etc.) … Each label below is converted to corresponding top-level labels

  13. Systems • LLDA-based • DUI candidate list pruning DUI DUI DUI doc Search-based LLDA-based Categories DUI A pruned rank list

  14. Data Training -- <PMID, title, abstract, labels> Testing -- input:<PMID, title, abstract> output: <PMID, labels>

  15. Evaluation MM: MetaMap-based system Mi: micro LCA: lowest common ancestor

  16. Conclusion and Future Work • Three Systems • MetaMap-based, search-based, LLDA-based • Research findings • Explored impact of various parameter on performance • Promising results from search-based labeling • Future Direction • Better concept weighting strategies • E.g., corpus-level statistics, external resources • Comprehensive comparisons with existing methods • A better strategy for incorporating hierarchical info. Into LLDA

  17. Questions & Discussion

More Related