1 / 28

A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library

A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library. Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu. NIH Program Project Grant. A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty

livia
Download Presentation

A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. ChuComputer Science Dept, UCLA wwc@cs.ucla.edu

  2. NIH Program Project Grant • A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty • Project 1-- teleradaiology infrastructure • Project 2-- neuroradiology workstation • Project 3-- multimedia information architecture • Project 4-- natural language processing for medical reports • Project 5-- medical digital library 2

  3. Graduate students:Victor Z. LiuWenlei MaoQinghua Zou Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D. Project 5 Personnel • Project leader: Wesley W. Chu 3

  4. Data in a Medical Digital Library • Structured data (patient lab data, demographic data,…)--CoBase • Images (X rays, MRI, CT scans)--KMeD • Free-text • Patient reports • Teaching files • Literature • News articles 4

  5. System Overview Ad-hoc query Medical Digital Library(MDL) Patient report for content correlation Query results News Articles Patient reports Medical literature Teaching materials 5

  6. A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … 6

  7. ??? How to treat the disease ??? How to diagnose the disease Diagnosis-related articles Treatment-related articles Scenario Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … 7

  8. Challenge I: Indexing • Extracting domain-specific key concepts in the free text for indexing • Free-text: Lung cancer, small cell, stage II • Concept terms in knowledge source: stage II small cell lung cancer • Conventional methods use NLP • Not scalable • Cannot adapt to various forms of word permutation 8

  9. ? √ Challenge II: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Query: lung cancer, chest x-ray, bronchography, … Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … 9

  10. Challenge III: Mismatching between terms used in query and documents • Example Query: … lung cancer, … ? ? ? Document 1: … lung carcinoma … Document 3: anti-cancerdrug combinations… Document 2: … lung neoplasm … 10

  11. Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 11

  12. IndexFinder: Extracting domain-specific key concepts • Technique • Permute words from text to generate concept candidates. • Use knowledge base to select the valid candidates. • Problem • Valid candidates may be irrelevant to specific domain indexing. 12

  13. Eliminating irrelevant concepts • Syntactic filter: • Limit permutation of words within a sentence. • Semantic filter: • Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts • Use ISA relationship to filter out general concepts and yield specific concepts. 13

  14. IndexFinder Performance • Two orders of magnitude faster than conventional approaches • No NLP • Knowledge base (UMLS) and index files are resided in main memory • Time complexity is linear with the number of distinct words in the text • Preliminary Evaluation • IndexFinder generates • 4% more concepts than conventional approaches (using a single noun phrase) • All concepts are relevant 14

  15. Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 15

  16. expansion Query Expansion (QE) • Queries in the following form benefit from expansion:<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. diagnosis options <key concept> + <specific supporting concept(s)>e.g. lung cancere.g. chest x-ray, bronchography 16

  17. expansion Traditional QE • Appends all terms that statistically co-occur with the key terms in the query • Not semantically focused Original Query: lung cancer, diagnosis options Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate 17

  18. Key concept Knowledge-based QE Knowledge source (UMLS,by theNLM) Sign or Symptom PharmacologicSubstance BodyParts Injury orPoisoning Disease or Syndrome Diagnostic Procedure diagnoses diagnoses diagnoses Semantic Network Metathesaurus chest x-ray lung cancer Specific supporting concepts A class of concepts that belong to a Semantic Type Semantic Type Concept 18

  19. Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 19

  20. ? √ √ √ ? ? Phrase-based Vector Space Model (VSM) Query: … lung cancer, … Query: … lung cancer, … lung cancer = lung carcinoma … missing!!! parent_of anti-cancer drug combinations Document: … anti-cancer drugcombinations … Document: … anti-cancer drugcombinations … Document: … lung neoplasm … Document: … lung carcinoma … lung neoplasm … Knowledge-source 20

  21. Phrases: [(C0242379); “lung” “cancer”]… Phrases: [(C0003393); “anti” “cancer” “drug” “combin”]… Query: “lung cancer …” Document: “anti-cancer drugcombinations …” Query Document Phrase-based VSM Examples [(C0242379); “lung” “cancer”] … [(C0003393); “anti” “cancer” “drug” “combin”] … 21

  22. Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16%100 queries vs. 5% 50 queries 22

  23. System Overview Ad-hoc query Medical Digital Library(MDL) Patient report for content correlation Query results News Articles Patient reports Medical literature Teaching materials 23

  24. Application: Query Answering via Templates • Sample templates:“<disease>, treatment,”“<disease>, diagnosis” relevant documents Phrase-basedVSM lung cancer lung cancer QueryExpansion radiotherapy IndexFinder chemotherapy Template:“<disease>, treatment” lung cancer, treatment … cisplatin 24

  25. Applications (cont’d) • Scenario-specific content correlation relevant documents e.g. treatment, diagnosis, etc. Phrase-basedVSM Query Templates Scenario Selection QueryExpansion IndexFinder … Patient Report 25

  26. Conclusion • Knowledge based (UMLS) approach provides scenario-specific medical free-text retrieval • IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing • Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents • Phrase based indexing – transform document indexing into phrase paradigm (conceptand its word stems) to improve retrieve effectiveness 26

  27. Acknowledgement This research is supported in part by NIC/NIH Grant#4442511-33780 27

  28. Demo http://fargo.cs.ucla.edu/umls/search.aspx • Test Texts • Technically successful left lower lobe nodule biopsy. • Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus. • CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule. • Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted. • There may be a tiny left apical air collection in the pleural space lateral to the apical bulla. • Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy. 31

More Related