160 likes | 277 Views
Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School. Acknowledgements. Relation Extraction team: Steve Bethard Lee Becker Wei-Te Chen Guergana Savova Annotation team
E N D
Discovering Severity and Body Site Modifiers Dmitriy Dligach, Ph.D. Boston Children’s Hospital and Harvard Medical School
Acknowledgements Relation Extraction team: • Steve Bethard • Lee Becker • Wei-Te Chen • GuerganaSavova Annotation team • David Harris, Glenn Zaramba, Donna Ihrke • Dann Albright and his team
Motivation • Clinical Element Model template has attributes/modifiers for body site and severity • Critical to discover these modifiers to normalize to and populate a CEM template • Body site modifiers: • “diverticulosis of sigmoid colon” • “LUNGS: Equal AE bilaterally, no rales, no rhonchi.” • Severity: • “low-grade fever” • “severe headache”
Relation Extraction • Cast as Relation Extraction for two types of UMLS relations LocationOf(Anatomical Site, Disease/Disorder) DegreeOf(Modifier, Disease/Disorder) • Example: “LUNGS: Equal AE bilaterally, no rales, no rhonchi.” LocationOf(LUNGS, rhonchi) LocationOf(LUNGS, rales)
Prerequisites: Entities • LocationOf • Automatic entity discovery • cTAKES extract entities of these UMLS semantic types: • Drug • Disorder • Sign/Symptom • Procedure • Anatomical Site • DegreeOf • Modifiers • Entities
Prerequisites: Modifiers • Modifier discovery module • Implemented in cTAKES • BIO (Begin, Inside, Outside) representation • Word features • Algorithm: SVM • Informal evaluation results • All automatically discovered modifiers appear to be valid
Approach • Supervised learning • Input: a pair of entities • Output: relation / no relation label • Sample sentence • “LUNGS: Equal AE bilaterally, no rales, no rhonchi.” ?(LUNGS, rhonchi) ?(LUNGS, rales) ?(rales, rhonchi)
Learning • Training • Pair up all entity pairs • Assign a gold relation label (including NONE) • Downsample • Train an SVM model • Testing • Pair up all entities in test set • Pass to the model • Assign label
Features • Word features • Words of mentions • Context words • Distance • Named entity features • Entity types • Entity context • POS features • POS tags of entities • POS tags between entities • Dependency features • Distance to common ancestor • Dependency path features • Chunking features • Head words of phrases between entities • Phrase head context • Wikipedia features • Entity similarity • Article titles
ClearTK Integration • Tutorial by Steve Bethard today • Feature extraction • Common interface for feature extractors • Many commonly used feature extractors available • Training • Commonly used machine learning packages • Training data writers • Evaluation framework • N-fold cross validation • Training and testing
Gold Annotated Data • SHARP • All entity types • Total notes: 80 • Total instances of LocationOf: 1852 • Total instances of DegreeOf: 308 • ShARe (Shared Annotation Resources, PIs: Chapman, Elhadad, Savova) • Anatomical Sites and Disease/Disorders • Total notes: 130 • Total instances of LocationOf: 2190 • Total instances of DegreeOf: 702
Evaluation • Two-fold cross validation • LibSVM • Linear kernel • Parameter search • Kernel (Linear/RBF) • SVM Cost parameter • RBF gamma parameter • Probability of keeping a negative example • Evaluation is on gold entities as input to the relation classifier
Current Results • SHARP data • LocationOf: F1 = 0.72 • DegreeOf: F1 = 0.93 • SHARE data • LocationOf: F1 = 0.88 • DegreeOf: F1 = 0.94 • Best parameters • Linear kernel • Downsampling rate: 0.5 • Best features • Entity features • Word features
Future Directions • Evaluation on a held-out test set • Evaluate on cTAKES-generated entities • Other types of relations relevant for CEM • Detailed error analysis • Release cTAKES module (July 2012) with trained models • Publish our findings
Funding • The Strategic Health IT Advanced Research Projects (SHARP) Program (90TR002) administered by the Office of the National Coordinator for Health Information Technology • Integrating Informatics and Biology to the Bedside (i2b2) NCBO U54LM008748