650 likes | 801 Views
Knowledge-based Information Management for Biomedical Applications. Wesley Chu Computer Science Department University of California Los Angeles, CA wwc@cs.ucla.edu www.kmed.cs.ucla.edu. Outline. Data types Uses of knowledge bases to enhance information management Sample systems
E N D
Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science Department University of California Los Angeles, CA wwc@cs.ucla.edu www.kmed.cs.ucla.edu
Outline • Data types • Uses of knowledge bases to enhance information management • Sample systems • Structured data • Multi-media • Free-text • Conclusion
Information Formats used in Biomedical Applications • Structure Data • Multi-media Images • Semi-structure • Free-text
Uses of Knowledge Bases to Enhance Information Management • Approximate matching • Query conditions • Image features • Similar conceptual terms
Uses of Knowledge Bases to Enhance Information Management • KB query processing • Similarity query answering • Associative query answering • Scenario-specific query answering • Sentinel --Triggering and alerting
Examples of KB Information Systems • CoBase (1990-1998), DARPA • A database that cooperates with the user for structure data • KMeD (1991-2000), NSF • A Knowledge-based medical multi-media database • Medical Digital Library (2001-2005), NIH • A knowledge-based digital file room for patient care, education, and research.
Graduate students:K. Chiang C. Larson R. Lee M. Merzbacher M. Minock Frank Meng Wenlei MaoMark Yang K. Zhang Staff: Q. Chen Gladys ChowHua Yang CoBase www.cobase.cs.ucla.edu • Project leader: Wesley W. Chu
CoBase: Cooperative Databases • Conventional query answering • Need to know the detailed data based schema • Cannot get approximate answers • Cannot answer conceptual queries • Cooperative query answering • Derive approximate answers • Answer conceptual queries • Provide additional relevant answers that user does not (or does not know how to) ask for
CooperativeQueries CoBase Servers Heterogeneous Information Sources Find a nearby friendly airport that can land F-15 Find hospitals with facility similar to St. John’s near LAX CoBase provides: Relaxation Approximation Association Explanation Domain Knowledge Find a seaport with railway facility in Los Angeles
More Conceptual Query Specialization Generalization Conceptual Query Conceptual Query Specialization Generalization Specific Query Specific Query Generalization and Specialization
Cooperative Querying for Medical Applications • Query • Find the treatment used for the tumor similar-to(loc, size)X1 on 12 year-oldKorean males. • Relaxed Query • Find the treatment used for the tumor Class Xon preteenAsians. • Association • The success rate, side effects, and cost of the treatment.
Tumor (location, size) Age Ethnic Group Class X [loc1loc3] [s1 s3] Class Y [locY sY] Preteens Teen Adult Asian African European 11 12 10 9 Japanese Filipino Korean Chinese X3 [loc3 s3] X1 [loc1 s1] X2 [loc2 s2] Type Abstraction Hierarchies forMedical Domain
KB: Type Abstraction Hierarchy • Using clustering technique to group similar • Attribute values • Image features • Spatial relationships among objects • Provides multi-level knowledge (conceptual) representation
Data mining for TAH for NumericalAttribute Values • Clustering metrics: relaxation error • Difference between the exact value and the returned approximate value • Relaxation error is weighted by the probability of occurrence of each value • Can be extended to multiple attributes
Query Display Yes Relax Attribute Answers Database No Query Modification TAHs Query Relaxation
Summary: CoBase • Derive Approximate Answers • Answer Conceptual Queries • Provide Associative Query Answers
Graduate students:Alex Bui Chrisitna Chu John Dionisio T. PlattnerD. Johnson C. Hsu T. Ieong Consultants:Denies Aberle, M.D. C.M. Breant, Ph.D KMeD www.kmed.cs.ucla.edu • PI: Wesley Chu, Ph.D, Computer Science Department • Co-PIs: • A. Cardenas, Ph.D, Computer Science Department • Ricky Taira , Ph.D, School of Medicine
KMeD Goal: Retrieval of Images by Features & Content • Features • size, shape, texture, density, histology • Spatial Relations • angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction • Evolution of Object Growth • fusion, fission
Characteristics of Medical Queries • Multimedia • Temporal • Evolutionary • Spatial • Imprecise
TAH Lateral Ventricle TAH SR(t,b) TAH Tumor Size TAH SR(t,l) Knowledge Level SR(t,l) SR(t,b) Schema Level Lateral Ventricle Tumor Brain SR: Spatial Relation b: Brain t: Tumor l: Lateral Ventricle Knowledge-Based Image Model Representation Level (features and content)
Queries Query Analysis and Feature Selection Knowledge- Based Query Processing Knowledge-Based Content Matching Via TAHs Query Relaxation Query Answers
User Model To customize users’ interest and preference, needs, and goals. e.g. query conditions, relaxation control, etc. • User type • Default Parameter Values • Feature and Content Matching Policies • Complete Match • Partial Match
User Model (cont.) • Relaxation Control Policies • Relaxation Order • Unrelaxable Object • Preference List • Measure for Ranking • Triggering conditions
Query Preprocessing • Segment and label contours for objects of interest • Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects • Organize the features and spatial relationships of objects into a feature database • Classify the feature database into a Type Abstraction Hierarchy (TAH)
Similarity Query Answering • Determine relevant features based on query input • Select TAH based on these features • Traverse through the TAH nodes to match all the images with similar features in the database • Present the images and rank their similarity (e.g., by mean square error)
Visual Query Language and Interface • Point-click-drag interface • Objects may be represented by icons • Spatial relationships among objects are represented graphically
Visual Query Example Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture
Summary: KMeD • Image retrieval by feature and content • Matching images based on features • Processing of queries based on spatial relationships among objects • Answering of imprecise queries • Expression of queries via visual query language • Integrated view of temporal multimedia data in a timeline metaphor
Graduate students:Victor Z. LiuWenlei MaoQinghua Zou Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D. Medical Digital Librarywww.kmed.cs.ucla.edu • Project leader: Wesley W. Chu
Data Types Used in a Medical Digital Library • Structured data (patient lab data, demographic data,…)--CoBase • Images (X rays, MRI, CT scans)--KMeD • Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system)
A Free-Text Retrieval System (FTRS) Ad hoc query Knowledge-based Free- Text Retrieval System (FTRS) Patient report for content correlation Query results News Articles Patient reports Medical literature Teaching materials
A Sample Patient Report … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. … … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …
??? How to treat the disease ??? How to diagnose the disease Diagnosis-related articles Treatment-related articles Scenario-Specific Retrieval … Tissue Source: LUNG (FINE NEEDLE ASPIRATION) (LEFT LOWER LOBE) … FINAL DIAGNOSIS: - LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION): - LUNG CANCER, SMALL CELL, STAGE II. …
Challenge I: Indexing for Free-Text • Extracting key concepts in the free-text for indexing • Free-text: Lung cancer, small cell, stage II • Concept terms in knowledge source: stage II small cell lung cancer • Conventional methods use NLP • Not scalable
Challenge II: Mismatch between terms used in query and documents • Example Query: … lung cancer, … ? ? ? Document 1: … lung carcinoma … Document 3: anti-cancerdrug combinations… Document 2: … lung neoplasm …
? √ Challenge III: Terms used in the query are too general Expanding the general terms in the query to specific terms that are used in the document Query: lung cancer, diagnosis options Query: lung cancer, chest x-ray, bronchography, … Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …
A Medical KB:Unified Medical Language System (UMLS) • Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts) • Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.) • Specialized Lexicon