360 likes | 373 Views
Introducing ODIE. NCBO Seminar Series February 18, 2009. Example. IE using ontologies. OE using documents. punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion
E N D
Introducing ODIE NCBO Seminar Series February 18, 2009
OE using documents punch biopsy junctional component pagetoid spread dermal melanocytes Breslow depth lymphocytic infiltrates regression microscopic satellites vascular invasion tumor infiltrating lymphocytes Spitz nevus epithelioid nevus
Two Tasks ~ One problem Information Extraction: Uses concepts as source of concepts and relationships to enrich and validate ontology Specific Aims 2,3,4 Ontology Text Ontology Enrichment: Uses concepts as source of concepts and relationships to enrich and validate ontology Specific Aims 1,3,5
Specific Aims Specific Aim 1:Develop and evaluate methods for information extraction (IE) tasks using existing OBO ontologies, including: Named Entity Recognition (NER) Co-reference Resolution (CR) Discourse Reasoning (DR) Attribute Value Extraction (AVE) Specific Aim 2:Develop and evaluate general methods for clinical-text mining to assist in ontology development, including: Concept Discovery (CD) Concept Clustering (CC) Taxonomic Positioning (TP) Specific Aim 3: Develop reusable software for performing information extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture. Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit. Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.
Ontology Enrichment • Machine assisted - Extraction- Filtering and Organization- Visualization- Suggestions • Human decision-maker (developer, curator) • Feedback and improvement of OE
Project Organization Concept Discovery Coreference Resolution ODIE 0.5 Develop and implement architecture and UI; Create framework for using results of research; Implement work of research groups Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms Study and compare methods for ontology enrichment; design methods for evaluation Kaihong Liu Rebecca Crowley Wendy Chapman Kevin Mitchell Wendy Chapman Guergana Savova Melissa Castine Rebecca Crowley Kevin Mitchell Girish Chavan Eugene Tseytlin
Domain Will attempt to develop general tools whenever possible • Priorities for evaluation of components in : • Radiology and pathology reports • NCIT as well as clinically relevant OBO ontologies (e.g. RadLex, FMA) • Cancer domains (including hematologic oncology)
Progress • ODIE 0.5 pre-release on NCBO SourceForge • Annotation software and document sets • Res Proj #1: LSP annotation project • Res Proj #2: Coreference resolution annotation • Starting Res Proj #3: Discourse Reasoning
ODIE Software • Toolkit for developers of NLP applications and ontologies • Pre-released on NCBO SourceForge as ODIE 0.5 • Current release focuses on NER and CD • Support interaction and experimentation • Package systems at the conclusion of working with ODIE • Foster cycle of enrichment and extraction needed to advance development of NLP systems • Ontology enrichment as opposed to denovo development • Human-machine collaboration as opposed to fully automated learning
ODIE Download/Info ODIE Installer: http://caties.cabig.upmc.edu/ODIE/odieinstaller.exe GForge Site: https://bmir-gforge.stanford.edu/gf/project/odie/ User Forums: https://bmir-gforge.stanford.edu/gf/project/odie/forum/ ODIE on NCBO Tools Page: http://bioontology.org/tools/ODIE.html
Users/Workflow ODIE is intended for: • users who want to use NCBO ontologies to perform various NLP tasks (+/- may need to add concepts locally to achieve sufficient performance) • users who want to enrich ontologies using concepts derived from documents (very early in process of ontology development)
Plans for ODIE 1.0 Ability to import additional ontologies from Bioportal or from owl files Ability to export proposal/enriched ontologies. Ability to add and configure new processing resources (UIMA or GATE based) Ability to build processing pipelines using processing resources Will come out of the box with a processing pipeline and processing resources for NER, CD and COREF.
Research Project 1:Ontology Enrichment Nearly completed survey of lexical, statistical and hybrid methods for ontology enrichment Methodology to study “utility” of various approaches (Liu, PhD Thesis in progress) First project underway involves the simplest of the methods to be studied – Lexicosyntactic Patterns (LSP) – regular expressions over POS Concept Discovery Study and compare methods for ontology enrichment; design methods for evaluation Kaihong Liu Rebecca Crowley Wendy Chapman Kevin Mitchell
LSP Patterns The presence of certain “lexico-syntactic patterns” can indicate a particular semantic relationship between two nouns Example: DIFFERENTIAL DIAGNOSIS INCLUDES, BUT IS NOT LIMITED TO, SPINDLE CELL NEOPLASM OF PERINEURIAL ORIGIN (SUCH AS SCHWANNOMA) AND SPINDLE CELL MALIGNANT MELANOMA “such as” indicates hyponym relationship between two noun phrase
Technique 1 - LSP • PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS) • COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA
LPS distribution result Number of sentences contain lexico-syntactic pastterns
Step 1 -Domain Expert annotation Annotation tasks: Meaningful medical phrases (MMP) that can stand alone before LSP and after LSP. The phrases before and after LSP have to be related • LSP • Before LSP • After LSP • PRURIGO NODULE (aka LICHEN SIMPLEX CHRONICUS) • COMPATIBLE WITH BENIGN ECCRINE NEOPLASIA, SUCH AS NODULAR HIDROADENOMA Calculate : total # of MMP , # of MMP per LSP
Step 2 - Curator Judgment For each pair of terms For each term • Is the concept in the ontology? • If not, should it be added into the ontology? • If not, what is the reason? • What is the relationship between them? • Is this relationship exist in the ontology? • If not, should it be added into the ontology? • If not, what is the reason? New Concept and Relationship Suggestion Rates New Concept and Relationship Acceptance Rates
Research Project 2:Coreference Resolution Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent) Examples of Types of anaphoric relations: Identity (or coreference)Set/subsetPart/whole Anaphora resolution is a computational technique for the discovery of anaphoric relations Coreference Resolution Develop annotation scheme; create Reference Standard, consider and test existing algorithms; design, implement & test new algorithms Wendy Chapman Guergana Savova Melissa Castine
Definitions Anaphoric relations are relations between linguistic expressions where the interpretation of one linguistic expression (the anaphor) relies on the interpretation of another linguistic expression (the antecedent) Type of anaphoric relations Identity (or coreference)Set/subsetPart/wholeOther Anaphora resolution is a computational technique for the discovery of anaphoric relations
Progress Completed and Ongoing: Annotation schema Development Guidelines Training of annotators 4 training sessions IAA: after session 1 – in the 40’s IAA: after session 3 – in the 60’s Planned: Complete Reference Standard (RS) Algorithm testing and further development
Data Sets for RS 50 clinical notes (named entities annotated) 50 Pathology (disorders, tumors) 20 Pathology (conditions) 20 Radiology (conditions) 20 Discharge summaries (conditions) 20 ED (conditions) 20 ED (respiratory conditions) • Mayo • Pitt