170 likes | 379 Views
An Incremental Approach to MEDLINE MeSH Indexing. Presenter: Hongfang Liu. Team Member: Mayo Clinic: Wu Stephen, James Masanz , and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette. BioASQ 2013. Outline. Motivation & Task Incremental Systems MetaMap -based
E N D
An Incremental Approach to MEDLINE MeSH Indexing Presenter: Hongfang Liu Team Member: Mayo Clinic: Wu Stephen, James Masanz, and Hongfang Liu University of Delaware: Dongqing Zhu, Ben Carterette BioASQ2013
Outline • Motivation & Task • Incremental Systems • MetaMap-based • Search-based • LLDA-based • Experiment Setup • Evaluation • Conclusion
Motivation of BioASQTask • Reduce human effort in MeSH indexing • Increasing number of new articles • Low consistency among annotators [Funk and Reid] • Automatic MeSHindexing • Suggest MeSH terms for a given new article
Motivation of Mayo’s Participation • Information retrieval (IR)-based ontology annotation • Traditional approach has been information extraction-based • Three levels of intelligence in artificial intelligence • Knowledge-base intelligence • Data intelligence • User intelligence > Explore the use of topic modeling and distant supervision for ontology annotation
Proposed Approaches DUI DUI DUI • MetaMap-based • Search-based • LLDA-based Three approaches can work either independently or together in an incremental way DUI
MetaMap-based System MetaMap Restricted to MeSH ontology Title: Age-period-cohort effect on mortality from cervical cancer. Abstract: to estimate the effect of age, period and birth cohort … Title_score Score threshold A ranked list of CUI => a ranked list of DUI Top DUI
MetaMap-based System • Parameter Tuning Title weight Score threshold Top DUI Titles concepts are more important Low threshold roughly leads to high precision/recall Tradeoff between P/R
Search-based System • Retrieval Model • DUI Aggregation – query term – query weight – matching function – document – Dirichlet parameter D01, D02, D03 … Docs DUI D08, D03, D01 … ranked by tf * score(Q, D) D02, D03, D01 …
Search-based System • Term Query • is a single-word expression • concept-related words in title and abstract • Phrase Query • is a multi-word expression • concept-related phrases in title and abstract • Long Query • mix of TQ and PQ #weight(2.0 examination 2.0 cow 2.0 ultrasonographic 3.0 navel 3.0 urachal 3.0 extra-abdominal 2.0 pathologic 2.0 abscess) #weight(3.5 #uw2(hiv-1 infection) 4.5 #uw2(differential susceptibility) 2.0 #uw2(actin dynamics) 2.0 actin 4.5 #uw2(cortical actin) 4.5 #uw3(naive t cells) 2.5 dichotomy 3.5 #uw2(human memory) 3.5 #uw3(chemotactic actin activity) 2.0 cd45ro)
Search-based System • Parameter Tuning Dirichlet Smoothing parameter Top-ranked documents Top-ranked DUI Less smoothing => better performance A small set of highly relevant documents Tradeoff between P/R
Systems • LLDA-based • LDA Process • Each document is a mixture of topics • Each topic is a multinomial word distribution • Labeled LDA • Incorporate label information
Systems • LLDA-based • Top categories in MeSH root Top-level categories as topics (e.g., Anatomy Category, Chemicals and Drugs Category, etc.) … Each label below is converted to corresponding top-level labels
Systems • LLDA-based • DUI candidate list pruning DUI DUI DUI doc Search-based LLDA-based Categories DUI A pruned rank list
Data Training -- <PMID, title, abstract, labels> Testing -- input:<PMID, title, abstract> output: <PMID, labels>
Evaluation MM: MetaMap-based system Mi: micro LCA: lowest common ancestor
Conclusion and Future Work • Three Systems • MetaMap-based, search-based, LLDA-based • Research findings • Explored impact of various parameter on performance • Promising results from search-based labeling • Future Direction • Better concept weighting strategies • E.g., corpus-level statistics, external resources • Comprehensive comparisons with existing methods • A better strategy for incorporating hierarchical info. Into LLDA