1 / 36

Intelligent Information Directory System for Clinical Documents

Intelligent Information Directory System for Clinical Documents. Qinghua Zou 6/3/2005. Dr. Wesley W. Chu (Advisor). Keyword Search Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words. When searching clinical reports. Intelligent Directory System.

nellis
Download Presentation

Intelligent Information Directory System for Clinical Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)

  2. Keyword Search Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words When searching clinical reports

  3. Intelligent Directory System • 1. Overview • 2. Extracting Key Concepts • 3. Mining Topics • 4. Building Directories • 5. Searching • 6. Conclusion

  4. 1. System Overview

  5. 2. Concept Extraction • 2.1 Introduction • 2.2 Our approach: IndexFinder • Index Phase (Offline) • Search Phase (Real Time) • 2.3 Experiments • 2.4 Summary

  6. Clinical Texts • Extract key info. • Standard terms 2.1 Motivation • Clinical texts are valuable in medical practice • Search relevant reports • Search similar patients • What is key information? • UMLS provides • key medical concepts • Our Goal • Extract UMLS concepts from clinical texts

  7. UMLS NLP Parser Noun phrases Mapping ip dp i1 • lambs • oats UMLS Concepts lambs i0 vp v0 dp will eat oats 2.1 Previous Approaches Free text

  8. 2.1 Problems of Previous Approaches • Concepts cannot be discovered if they are not in a single noun phrase. • E.g. In “second, third, and fourth ribs”, “Secondrib” can not be discovered. • Difficult to scale to large text computing. • Natural language processing requires significant computing resources

  9. Our approach: UMLSfree text Free text NLP Parser Index phase (offline) UMLS 2GB Noun phrases Indexing Mapping Concepts Index Data ~80MB UMLS Extracting Filtering concepts Search phase (real time) Free text 2.2 Our Approach: IndexFinder Previous: free textUMLS Suppose UMLS contains only “Lung cancer” We would discard all words in the text except “lung” and “cancer”.

  10. 2.2 Our Approach: What’s New? • Knowledge-based approach • Using the compact index data without using any database system • Permuting words in a sentence to generate UMLS concept candidates. • Using filters to eliminate irrelevant concepts.

  11. 2.2 Concept Candidates Generation Assumptions • Knowledge base provides a phrase table. • Each phrase (concept) is a set of words. • An input text T is represented as a set of words. Goal • Combining words in T to generate concept candidates Example • T={D,E,F} • Answer: 5

  12. 2.2 Search Phase: Filtering Use filters to eliminate irrelevant concepts • Syntactic filter: • Word combination is limited within a sentence. • Semantic filter: • Filter out irrelevant concepts using semantic types (e.g. body part, disease, treatment, diagnose). • Filter out general concepts using the ISA relationship and keep the more specific ones.

  13. MetaMap IndexFinder 2.3 Experiment Comparison with MetaMap [3] Input:A small mass was found in the left hilum of the lung.

  14. 2.4 Summary • An efficient method that maps from UMLS to free text for extracting concepts without using any database system. • Syntactic and semantic filters are used to eliminate irrelevant candidates. • IndexFinder is able to find more specific concepts than NLP approaches. • IndexFinder is scalable and can be operated in real time.

  15. 3. Mining Topics: SmartMiner • 3.1 Introduction • 3.2 Search Space • 3.3 SmartMiner • 3.4 Experiment • 3.5 Summary

  16. 3.1 Introduction • A Topic (assumption) • a set of concepts • a frequent pattern • Finding topics by data mining • Frequent patterns, or • Maximal frequent patterns • Require efficient data mining

  17. Dataset id: item set a, b, c, d, e, 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde, abcd MinSup=2 3.1 Data Mining Problem What itemsets are frequent itemsets (FI)? Maximal frequent itemset(MFI): No superset is frequent. MFI abcd, be, cde

  18. 3.1 Why MFI not FI? • Mining FI is infeasible when there exists long FI. • E.g, Suppose we have a 20-item frequent set a1 a2 …a20. All of its subset are frequent, i.e., 220=1,048,576 • Mining MFI is fast and we can generate all the FI.

  19. 3.1 Previous work • Superset checking. • A study shows that CPU spends 40% time for superset checking. • Search tree is too large • A large number of support counting • Need more efficient method

  20. simplify Ø:abcde :abcde What is the space of ? ab:cd 3.2 Search space Given 5 items: a, b, c, d, e. What is the search space? Ø, a, b, c, d, e, ab, ac, ad, ae, bc, …, abcde We use “head:tail” to denote the space as: ab, abc, abd, abcd

  21. 3.2 Space decomposition For a space :abcde, if abcg is frequent, • Then, the known space • any subset of abc is frequent • known space is :abc • The unknown space are: • Any itemsets contain d or e. • d:abceande:abc • :abcde = d:abce + e:abc + :abc

  22. A1 A1 B1 B2 … Bn B1 B’ … Creating B2 before exploring B1 Creating B’ after exploring B1 … … 3.3 The basic idea Using information from B to prune the space at B’ (b) SmartMiner Strategy (a) Previous approach SmartMiner takes advantages of the information from previous steps.

  23. 3.3 The tail information • For the space :abcde, if we know abcf, abcg and abfg are frequent, then we project them to the space. • abcfabc. • abcgabc. • abfgab. • Thus • Tinf(abcf,abcg, abfg|:abcde)={abc}

  24. 3.4 Running time on Mushroom

  25. 3.5 Summary • SmartMiner uses tail information to guide the mining, efficient since • A smaller search tree. • No superset checking. • Reduces the number of support counting.

  26. 4. Building Directories • 4.1 Introduction • 4.2 Knowledge Hierarchies • 4.3 User Specification • 4.4 Directory Generation • 4.5 Integration various directories • 4.6 Summary

  27. Three Inputs Topics Key Content Knowledge trees Meaningful User specs Customized 4.1 Introduction

  28. 4.2 Knowledge Hierarchies • UMLS concept hierarchies • PA: parent-child relationship • RA: rather-than relationship • Problems • A concept: several parents, different granularity • [lung cancer] [Neoplasms, Respiratory Tract] • [lung cancer] [Neoplasms, Respiratory System] • A concept: hundreds of paths to roots • [lung cancer]: 233 different paths in UMLS by PA

  29. 4.2 Select Proper Hierarchies • Set source preference order, e.g • [disease]: ICD9>SNOMED>MeSH • [body part]: SNOMED>ICD9 • Select proper granularity • C: a set of concepts; n: a path node • Score function for selecting the node n • S(n)=|{ci| cin, ci in C}| • Expert review

  30. 4.3 User Specifications • A good directory ~ usage pattern • User spec  usage pattern • User may have different specs • A spec: a series of knowledge names • [disease] + [body part], or • [body part] + [disease] • Build a directory for a spec by the ordering

  31. 4.4 Directory GenerationAn example User spec 1: d + p [disease] + [body part] User spec 2: p + d [body part] + [disease]

  32. 4.4 ~ An example d + p 1 1 1 1 p + d 1 1 1 1

  33. 4.4 ~ Algorithm

  34. For each Di, get all dir paths to Di A Di is tree: XML Key words can associate with tree nodes Query: xpath Exist redundant information 4.5 Integration various directories

  35. 4.5 simplified model • Keep only the first level knowledge trees • For //d6//p6, we use XPath query //doc[//d6 and //p6] • Size smaller, require some computation

  36. 4.6 Summary • Build directory by • Topics • Knowledge hierarchies • User specifications • Mapping directories to XML • By collecting directory paths for each document • Leverage on existing XML technologies

More Related