210 likes | 378 Views
Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update. Frank Hartel Sherri De Coronado Gilberto Fragoso Iris Guo Kim Ong. Outline. Terminology development -- concept creation, modification, split, merge, retirement Edit history Usage TDE Ontylog editor extension
E N D
Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update • Frank Hartel • Sherri De Coronado • Gilberto Fragoso • Iris Guo • Kim Ong NCICB Jamboree
Outline • Terminology development -- concept creation, modification, split, merge, retirement • Edit history Usage • TDE Ontylog editor extension • Next steps • Summary NCICB Jamboree
Elementary Edit Actions In Terminology Development (Create, Modify, Split, Merge, Retire) Create Split Create Split Create Split Create Split Modify Modify Modify Modify Version 3 Version 1 Version 2 Version 4 Retire Merge Retire Merge Retire Merge Retire Merge Evolution of versions/baseline over time NCICB Jamboree
Scientific Reasons for Concept Splits • Oncogene ras discovered based on sequence homology (hybridization) to the v-onc gene of the Harvey strain of murine sarcoma virus. • Subsequently, it was discovered that there were multiple related ras genes, Ha-ras, and Ki-ras. Later on, a new ras, N-ras, was found. NCICB Jamboree
Scientific Reasons for Concept Merges • BCL1 gene discovered in the vicinity of a t(11;14) translocation, involved in the malignant transformation of B cells. • PRAD1 gene found in parathyroid adenomas bearing chromosomal abnormalities. • CCND1 codes for one of a set of proteins, cyclins, that regulate cell cycle progression. NCICB Jamboree
Concept Based Retrieval C2 C1 User Concepts used for retrieval D1<C1, C2> Search Engine D2<C1, C3, C4> Relevant documents Document Indexing terms NCICB Jamboree
Edit History Usage Thesaurus version pre-indexed documents Edit History R1 Version 1 new R2 Version 2 modify R3 retire Version 3 merge split R4 Version 4 • Document are often indexed using different versions of terminology. • Re-indexing document to keep in pace with changes made to the terminology is impractical and can be very costly. • Edit history can greatly enhance precision and recall. Search Engine Concepts used for retrieval NCICB Jamboree
Edit History Storage NCICB Jamboree
Terminology Development Environment NCICB Jamboree
Terminology Development Environment • Previously, only three types of edit action are logged – add, modify, and delete. • Concepts created through split actions are confounded by newly created concepts. • Concepts merged into other concepts are indistinguishable from retired concepts. • Failure to explicitly track merge and split edit actions may result in a low recall rate in information retrieval. * Recall defines the number of relevant documents retrieved as fraction of all relevant documents. NCICB Jamboree
Approach Taken to Extend TDE • Create reusable concept edit tree Java bean • Develop user interface for processing split, merge, and retirement edit actions • Log edit events in TDE history database with clarity and precision NCICB Jamboree
Extend Ontylog Editor With Plug-Ins Use Concept Edit Tree widget to build plug-ins NCICB Jamboree
TDE Extension - Split Panel Roles and properties may be transferred from one concept to another using drag & drop. A concept is created as a result of a split. Edit action is explicitly logged in the TDE History database as a split event. NCICB Jamboree
TDE Extension - Merge Panel Concept to stay Concept to retire Non-redundant roles and properties are transferred from the retiring concept to the resultant merged concept. Edit action is explicitly logged in the TDE History database as a merge event. NCICB Jamboree
TDE Extension - Preretirement Concept to retire • Sub-concepts are re-treed. • Role relationships targeted (i.e., pointing) to the retiring concept are either removed or re-targeted. NCICB Jamboree Concept can be retired only if all preconditions are met.
TDE Extension - Retire Panel A non-editable tree shows concept definition information pertinent to the retiring concept. Edit action is explicitly logged in the TDE History database as a retire event. NCICB Jamboree
Next Steps • Consolidate edit history logged by individual modelers in terminology development environment (TDE) into concept history data useful to Distributed Terminology System (DTS) users NCICB Jamboree
Next Steps • Extend caBIO and DTS Server capability to facilitate high quality information retrieval caBIO.jar XMLRPC Server DTS History API XMLRPC Client Edit history database DTS Extension Repositories of Indexed Document DTS Server End User Applications EVS External Databases Concepts used for retrieval NCICB Jamboree ( to be developed )
Summary • Tracking explicit edit actions in TDE is absolutely essential to terminology and concept based information retrieval. • We have successfully extend TDE Ontylog editor to explicitly track split, merge, and retirement edit events. • Concept history data and supporting APIs will soon become available to DTS users and developers through caBIO. caBIO (Cancer Bioinformatics Infrastructure Objects) NCICB Jamboree
EVS Team • Frank Hartel • Sherri De Coronado • Gilberto Fragoso • Margaret Haber • Larry Wright • Jim Oberthaler • Northrop Grumman, Inc. • Kevric Corporation • Aspen Inc. • Apelon, Inc. • Kim Ong • Iris Guo • Bob Dione NCICB Jamboree
Contact Dr. Francis W. Hartel Center for Bioinformatics National Cancer Institute 6116 Executive Blvd. Rockville, MD 20892-8335 Phone: (301) 435-3869 Fax: (301) 480-4222 Email: hartel@mail.nih.gov NCICB Jamboree