280 likes | 396 Views
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library. Wesley W. Chu Computer Science Dept, UCLA wwc@cs.ucla.edu. NIH Program Project Grant. A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty
E N D
A Knowledge-based Approach to Retrieve Scenario Specific Free-text in a Medical Digital Library Wesley W. ChuComputer Science Dept, UCLA wwc@cs.ucla.edu
NIH Program Project Grant • A 5 year $ 10M joint interdisciplinary project between Medical School & CS faculty • Project 1-- teleradaiology infrastructure • Project 2-- neuroradiology workstation • Project 3-- multimedia information architecture • Project 4-- natural language processing for medical reports • Project 5-- medical digital library 2
Graduate students:Victor Z. LiuWenlei MaoQinghua Zou Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D. Project 5 Personnel • Project leader: Wesley W. Chu 3
Data in a Medical Digital Library • Structured data (patient lab data, demographic data,…)--CoBase • Images (X rays, MRI, CT scans)--KMeD • Free-text • Patient reports • Teaching files • Literature • News articles 4
System Overview Ad-hoc query Medical Digital Library(MDL) Patient report for content correlation Query results News Articles Patient reports Medical literature Teaching materials 5
A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … 6
??? How to treat the disease ??? How to diagnose the disease Diagnosis-related articles Treatment-related articles Scenario Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … 7
Challenge I: Indexing • Extracting domain-specific key concepts in the free text for indexing • Free-text: Lung cancer, small cell, stage II • Concept terms in knowledge source: stage II small cell lung cancer • Conventional methods use NLP • Not scalable • Cannot adapt to various forms of word permutation 8
? √ Challenge II: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Query: lung cancer, chest x-ray, bronchography, … Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer … 9
Challenge III: Mismatching between terms used in query and documents • Example Query: … lung cancer, … ? ? ? Document 1: … lung carcinoma … Document 3: anti-cancerdrug combinations… Document 2: … lung neoplasm … 10
Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 11
IndexFinder: Extracting domain-specific key concepts • Technique • Permute words from text to generate concept candidates. • Use knowledge base to select the valid candidates. • Problem • Valid candidates may be irrelevant to specific domain indexing. 12
Eliminating irrelevant concepts • Syntactic filter: • Limit permutation of words within a sentence. • Semantic filter: • Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts • Use ISA relationship to filter out general concepts and yield specific concepts. 13
IndexFinder Performance • Two orders of magnitude faster than conventional approaches • No NLP • Knowledge base (UMLS) and index files are resided in main memory • Time complexity is linear with the number of distinct words in the text • Preliminary Evaluation • IndexFinder generates • 4% more concepts than conventional approaches (using a single noun phrase) • All concepts are relevant 14
Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 15
expansion Query Expansion (QE) • Queries in the following form benefit from expansion:<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. diagnosis options <key concept> + <specific supporting concept(s)>e.g. lung cancere.g. chest x-ray, bronchography 16
expansion Traditional QE • Appends all terms that statistically co-occur with the key terms in the query • Not semantically focused Original Query: lung cancer, diagnosis options Expanded Query: lung cancer, radiotherapy, chemotherapy, antineoplastic agents, survival rate 17
Key concept Knowledge-based QE Knowledge source (UMLS,by theNLM) Sign or Symptom PharmacologicSubstance BodyParts Injury orPoisoning Disease or Syndrome Diagnostic Procedure diagnoses diagnoses diagnoses Semantic Network Metathesaurus chest x-ray lung cancer Specific supporting concepts A class of concepts that belong to a Semantic Type Semantic Type Concept 18
Challenge I: Indexing • Challenge II: Terms in the query are too general • Challenge III: Mismatch between terms in the query and the documents 19
? √ √ √ ? ? Phrase-based Vector Space Model (VSM) Query: … lung cancer, … Query: … lung cancer, … lung cancer = lung carcinoma … missing!!! parent_of anti-cancer drug combinations Document: … anti-cancer drugcombinations … Document: … anti-cancer drugcombinations … Document: … lung neoplasm … Document: … lung carcinoma … lung neoplasm … Knowledge-source 20
Phrases: [(C0242379); “lung” “cancer”]… Phrases: [(C0003393); “anti” “cancer” “drug” “combin”]… Query: “lung cancer …” Document: “anti-cancer drugcombinations …” Query Document Phrase-based VSM Examples [(C0242379); “lung” “cancer”] … [(C0003393); “anti” “cancer” “drug” “combin”] … 21
Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS) 16%100 queries vs. 5% 50 queries 22
System Overview Ad-hoc query Medical Digital Library(MDL) Patient report for content correlation Query results News Articles Patient reports Medical literature Teaching materials 23
Application: Query Answering via Templates • Sample templates:“<disease>, treatment,”“<disease>, diagnosis” relevant documents Phrase-basedVSM lung cancer lung cancer QueryExpansion radiotherapy IndexFinder chemotherapy Template:“<disease>, treatment” lung cancer, treatment … cisplatin 24
Applications (cont’d) • Scenario-specific content correlation relevant documents e.g. treatment, diagnosis, etc. Phrase-basedVSM Query Templates Scenario Selection QueryExpansion IndexFinder … Patient Report 25
Conclusion • Knowledge based (UMLS) approach provides scenario-specific medical free-text retrieval • IndexFinder – use word permutation as well as syntactic and semantic filtering to extract domain-specific key concepts in the free text for indexing • Knowledge-based query expansion – transform general terms in the query into the scenario specific terms used in the documents, giving the query a higher probability of matching with the relevant documents • Phrase based indexing – transform document indexing into phrase paradigm (conceptand its word stems) to improve retrieve effectiveness 26
Acknowledgement This research is supported in part by NIC/NIH Grant#4442511-33780 27
Demo http://fargo.cs.ucla.edu/umls/search.aspx • Test Texts • Technically successful left lower lobe nodule biopsy. • Preliminary localization CT images again demonstrate a left lower lobe nodule adjacent to the posterior segmental bronchus. • CT scans obtained during biopsy demonstrate the coaxial cannula adjacent to the proximal aspect of the nodule. • Surrounding pulmonary parenchymal hemorrhage as a result of the biopsy is also noted. • There may be a tiny left apical air collection in the pleural space lateral to the apical bulla. • Formal cytologic evaluation of the withdrawn specimen is pending at this time, although abnormal appearing "spindle" cells were identified during on-site cytopathologic evaluation of specimen adequacy. 31