1 / 16

Automatic Document Indexing in Large Medical Collections

Automatic Document Indexing in Large Medical Collections. Advisor : Dr. Hsu Presenter : Shu-Ya Li Authors : Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis, Evangelos E. Milios. 2006 . HIKM. Outline. Motivation Objective Current Approach : MMTx

willow
Download Presentation

Automatic Document Indexing in Large Medical Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Document Indexing inLarge Medical Collections Advisor : Dr. Hsu Presenter : Shu-Ya Li Authors :Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis, Evangelos E. Milios 2006 . HIKM

  2. Outline • Motivation • Objective • Current Approach : MMTx • Method : AMTEx • C/NC-value method • Use of MeSH Thesaurus as lexical resource • Experiments • Conclusion • Personal Opinions

  3. Motivation • MMTx, the U.S. NLM approach • maps biomedical documents to UMLS term concepts • The limitations of MMTx in term extraction: • term over-generation • term concept diffusion • unrelated terms added to the final candidate list • MMTx focus on UMLS rather than MeSH • But MEDLINE indexing is based on MeSH • To improve the efficiency of automatic indexing of medical documents.

  4. Objective • We propose a new method, AMTEX • Improving the efficiency of automatic term extraction by using C/NC-value method . • Indexing and retrieval of MEDLINE documents, based on the extraction and mapping of document terms to the MeSH Thesaurus.

  5. Current Approach : MMTx • Maps arbitrary text to UMLS Metathesaurus concepts: • Parsing (syntactic analysis - linguistic filter) • Variant Generation (uses SPECIALIST Lexicon) • Candidate Retrieval (mapping process to Metathesaurus Concepts) • Candidate Evaluation (criteria: centrality, variation, coverage, cohesiveness)

  6. MMTx Example • Parsing • Shallow syntactic analysis of the input text • Linguistic filtering: isolates noun phrases e.g. the term “ocular complications” is analysed as: • Variant Generation e.g. “obstructive sleep apnea” has variants: obstructive sleep apnea, sleep apnea, sleep, apnea, osa,… • Candidate Retrieval Candidate Metathesaurus concepts for the variant “osa” : osa [osa antigen], osa [osa gene product] osa [osa protein] osa [obstructive sleep apnea] • Candidate Evaluation Obstructive Sleep apnea 1000 Sleep Apnea 901 Apnea 827 … … Sleeping 793 Sleepy 755 • The limitations of MMTx in term extraction: • term over-generation • term concept diffusion • unrelated terms added to the final candidate list

  7. Method - AMTEx Input Document d, MeSH Ontology Term Mapping C/NC-value Multi-word Term Extraction & Term Ranking C/NC-value Multi-word Term Extraction & Term Ranking Single-word Term Extraction Term Variant Generation MeSH Thesaurus Resource Output MeSH Term Lists Term Expansion

  8. Step 1 & 2: C/NC value- Multi-word Term Extraction & Ranking • Part-of-Speech Tagging • Linguistic filtering: • Term Extraction - C-value • Term Ranking - NC-value • Keep terms up to threshold T1

  9. Step 3 : Term Mapping • Candidate terms are mapped to terms of the MeSH Thesaurus (simple string matching). • Only candidate terms matching MeSH are retained. • Multi-word candidates not matching MeSH may contain (shorter) MeSH terms.

  10. Step 4 : Single-word Term Extraction • For multi-word terms not matching MeSH • Multi-word are split into single-word terms • Single-word terms are validated against MeSH • Matched MeSH terms are added to term list

  11. Step 5 : Term Variant Generation • Inflectional variants of the extracted terms are identified during term extraction • (C/NC-value) • Stemmed term-forms are also available in MeSH and are added to the list of terms

  12. Step 6 : Term Expansion • Each term in the list is expanded with neighbor terms in MeSH • The expansion may include terms more than one level higher or lower than the original term, depending on T2

  13. Experiments • Precision and Recall measures • Dataset • 61 full MEDLINE documents, from PMC database of NCBI Pubmed • MEDLINE documents are paired to respective MeSH index terms, manually assigned by experts • Ground Truth • the set of MeSH document index terms • Benchmark method • MMTx against AMTEx

  14. Experiments

  15. Conclusion - AMTEx • designed for indexing and retrieval of MEDLINE documents • focuses on multi-word term extraction using valid linguistic & statistical criteria • based on MeSH - similarly to human indexing • selectively expands to term variants & synonyms • outperforms the current benchmark MMTx method, reaching better precision & recall

  16. Personal Opinions • Advantage • Drawback • … • Application • …

More Related