190 likes | 324 Views
NLDB 2004. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights. Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea. Contents. MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion. Contents.
E N D
NLDB 2004 Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
MEDLINE and MeSH • MEDLINE is a premier bibliography database of National Library of Medicine (NLM). • Medical Subject Headings (MeSH) is the authority list of controlled vocabulary terms used for subject analysis of biomedical literature at NLM. • Expert annotators of the NLM assign MeSH keywords to each MEDLINE document for effective retrieval. • Manual annotation with MeSH terms is a distinctive feature of MEDLINE. • MEDLINE is supplied with its own Boolean model-based search engine.
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
Vector Space Model • In the VSM, the documents are represented as vectors with the coordinates proportional to the number of occurrences. • The similarity between two vectors is measured using the cosine measure:
Suggested Method • We show that applying a Vector Space Model-based search engine to MEDLINE data gives much better results than Boolean-based. • More importantly, balancing the weights of the manually assigned MeSH keywords and the text words further improves the quality of the results.
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
Modulating MeSH Term Weights • MEDLINE documents contain MeSH keywords • Our idea is to increase the weights of MeSH terms in each documents vector.
Modulating MeSH Term Weights • We use following procedure 1. Assign the weights wij as in vector space model 2. Use formula to increase the weight of MeSH terms: where ρ is a parameter regulating the sensitivity of the formula to the MeSH terms
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
Experimental Results - Test Collection • Experimented with the well-known Cystic Fibrosis (CF) reference collection, which is a subset of MEDLINE. • It has 1,239 medical data records supplied with 100 queries with relevant documents provided.
Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion
Conclusions • Vector space model gives better results than Boolean model-based system. • Increasing the weights for MeSH terms as compared with the standard vector space model improves retrieval accuracy. • Optimal weights are balanced: both MeSH terms and text terms are taken into account • We get as much as 2.4 times better results than the system currently provided with MEDLINE.