1 / 19

Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights

NLDB 2004. Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights. Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea. Contents. MEDLINE and MeSH Vector Space Model Modulating MeSH Term Weights Experimental Results Conclusion. Contents.

june
Download Presentation

Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLDB 2004 Improving Information Retrieval in MEDLINE by Modulating MeSH Term Weights Kwangcheol Shin, Sang-Yong Han School of CSE, Chung-Ang Univ. Seoul, Korea

  2. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  3. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  4. MEDLINE and MeSH • MEDLINE is a premier bibliography database of National Library of Medicine (NLM). • Medical Subject Headings (MeSH) is the authority list of controlled vocabulary terms used for subject analysis of biomedical literature at NLM. • Expert annotators of the NLM assign MeSH keywords to each MEDLINE document for effective retrieval. • Manual annotation with MeSH terms is a distinctive feature of MEDLINE. • MEDLINE is supplied with its own Boolean model-based search engine.

  5. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  6. Vector Space Model • In the VSM, the documents are represented as vectors with the coordinates proportional to the number of occurrences. • The similarity between two vectors is measured using the cosine measure:

  7. Suggested Method • We show that applying a Vector Space Model-based search engine to MEDLINE data gives much better results than Boolean-based. • More importantly, balancing the weights of the manually assigned MeSH keywords and the text words further improves the quality of the results.

  8. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  9. Modulating MeSH Term Weights • MEDLINE documents contain MeSH keywords • Our idea is to increase the weights of MeSH terms in each documents vector.

  10. Modulating MeSH Term Weights • We use following procedure 1. Assign the weights wij as in vector space model 2. Use formula to increase the weight of MeSH terms: where ρ is a parameter regulating the sensitivity of the formula to the MeSH terms

  11. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  12. Experimental Results - Test Collection • Experimented with the well-known Cystic Fibrosis (CF) reference collection, which is a subset of MEDLINE. • It has 1,239 medical data records supplied with 100 queries with relevant documents provided.

  13. Experimental Results

  14. Results: Vector Space Model

  15. Results: Boolean vs. Vector

  16. Contents • MEDLINE and MeSH • Vector Space Model • Modulating MeSH Term Weights • Experimental Results • Conclusion

  17. Conclusions • Vector space model gives better results than Boolean model-based system. • Increasing the weights for MeSH terms as compared with the standard vector space model improves retrieval accuracy. • Optimal weights are balanced: both MeSH terms and text terms are taken into account • We get as much as 2.4 times better results than the system currently provided with MEDLINE.

  18. Thank you!

More Related