230 likes | 393 Views
Identifying Negation/Uncertainty Attributes for SHARPn NLP. Presentation to SHARPn Summit “ Secondary Use ” June 11-12, 2012 . Cheryl Clark, PhD MITRE Corporation . The Challenge: Text Mentions versus Clinical Facts. Negation : event has not occurred or entity does not exist
E N D
Identifying Negation/Uncertainty Attributes for SHARPn NLP Presentation to SHARPn Summit “Secondary Use” June 11-12, 2012 Cheryl Clark, PhD MITRE Corporation
The Challenge: Text Mentions versus Clinical Facts • Negation: event has not occurred or entity does not exist She had feveryesterday. • Uncertainty: a measure of doubt The symptoms are renal failure. • Conditional: could exist or occur under certain circumstances The patient should come back to the ED any rash occurs. • Subject: person the observation is on; experiencer had lung cancer. • Generic: no clear subject/experiencer E. coli is sensitive to Cipro but enterococcus is not no not inconsistent with if Mother fever renal infarction rash lung cancer Cipro … no uncertain conditional family member generic
Background:Assertion Analysis Tool, Version 1 Negation & Uncertainty Cue/Scope Tagger Compute scope enclosures by rule Input docs Identify sections Extract words, concepts, locations Independent Evaluation: i2b2/VA 2010 Clinical NLP Challenge Assertion Status Task F Score = 0.93 Assertion Classifier (Maximum Entropy) Identify word classes and ordering i2b2 concepts i2b2 assertions
Assertion Status Integration within SHARPn Clinical Document Pipeline cTAKES analysis engines Negation & Uncertainty Cue/Scope Tagger … … Compute scope enclosures by rule … Input docs Identify sections Extract words, concepts, locations Updated attribute annotations Annotations Assertion Classifier (Maximum Entropy) Identify word classes and ordering All annotations are UIMA Common Analysis Structure (CAS)
i2b2 Assertion Categories • Assertion classification system designed to meet requirements of 2010 i2b2/VA Challenge Assertion subtask Present: default category Patient had a stroke Absent: problem does not exist in the patient History inconsistent with stroke Possible: uncertainty expressed We are unable to determine whether she has leukemia Conditional:patient experiences the problem only under certain conditions Patient reports shortness of breath upon climbing stairs Hypothetical:medical problems the patient may develop If you experience wheezing or shortness of breath Not Patient:problem associated with someone who is not the patient Family history of prostate cancer Corresponds to SHARPn conditional
Re-architecting Assertions • i2b2 assertion output values • defined for medical problems • closed set of values • mutually exclusive (fixed priority when multiple values apply) • SHARPn assertion attributes present absent possible hypothetical not patient conditional single, multi-way classifier (no SHARPn equivalent) multiple classifiers, some binary • apply to various entities, events, relations • independent • attributes can have multiple values • additional attributes may be added negation yes/no uncertainty yes/no conditional yes/no subject multi-valued (patient, family, donor, other…) …
Assertion Module Refactoring: Phase 1 • Simple mapping from i2b2 assertion classes to SHARPn attributes • Uses existing i2b2-trained single classifier model • Identifies i2b2/SHARPn equivalences • Maps to SHARPn attribute values ] [ Please call physician you develop . if shortness of breath i2b2 assertion status = “hypothetical” SHARPn conditional attribute = “true”
Assertion Module Refactoring: Phase 2 • Direct assignment of SHARPn attribute values • Will use multiple classifiers trained on SHARPn data • Will identify attribute values directly • Benefits • Aligns with SHARPn concept attributes requirements • Aligns with SHARPn clinical data annotation • Enables more accurate meaning representation i2b2 2010 Paradigm Choose one: present absent possible hypothetical conditional not patient He does not smoke , has no hypertension , and has history of coronary artery disease. no family negator absent SHARPn Attribute Paradigm negation = present subject = family_member not patient family
System Errors=> Need for Better Linguistic Analysis for Assertions • Need for phrasal structure; scope extent not always enough She had [nochestpain or chestpressure] with this and this was deemed a negative test. negated not negated
Syntactic Approaches* • Insert a signifier node into constituency parse above entity • Use tree kernel methods to compare similarity with negated sentences in training data (can be used on other modifiers as well with varying degrees of success) * Slide courtesy of Tim Miller, Children’s Hospital Boston
Tree kernel fragment mining* • Use TK model to extract tree fragment features (Pighin & Moschitti 07) • Allows interaction with other feature types • Faster to find fragments than do whole-tree comparisons * Slide courtesy of Tim Miller, Children’s Hospital Boston
Next Steps: Assertions for Relations • Some assertion attributes apply to relations, too. • negation • uncertainty • conditional location relation uncertain • The are a although do the extent of . fundal AVMs potential site of bleeding explain bleeding not negated causal relation
Next Steps: Classifier Retraining and Component Evaluation • Model Retraining • Models for individual attributes • Linguistic features based on parser output • Training on SHARPn data • Enhancements to parsers • Evaluation • Accuracy on i2b2 gold annotations vs. accuracy on SHARPn gold annotations • i2b2 absent vs. SHARPn negated • i2b2 possible vs. SHARPn uncertainty • i2b2 hypothetical vs. SHARPn conditional • Evaluation based on system-generated entity annotations • Evaluation on CEM concept rather than on individual mentions
SHARPn Negation/Uncertainty Team John Aberdeen David Carrell Cheryl Clark Matt Coarr Scott Halgrim Lynette Hirschman Donna Ihrke Tim Miller Guergana Savova Ben Wellner Thank you!
Clarifying Definitions Negation and temporal Circumstantial negation (i2b2 calls this conditional) Allergens The patient had the tumor removed. The text span “removed” indicates the tumor was there but does not exist anymore. Originally annotated as negated. No longer annotated as negated. Course: degree_of (tumor, CHANGED (span for “removed”)) While smoking, he does not use his nicotine patch Annotated as negated ALLERGIES PCN Sulpha Zocor Asendin Rocephin Medications mentioned as allergens originally negated • Allergen status distinguished from negation • Allergy_indicator_class
System Errors=> Need for Better Linguistic Analysis for Assertions She had nosigns of infection on her legwounds and she did have some mild erythema around her right great toe Issue is structure and not simply span extent: present = should not be negated absent = negated ] [ She had [nochestpain or chestpressure] with this and this was deemed a negative test. negated not negated
MASTIF-Generated SHARPn attributes in cTAKES Output • [Add screenshot] default values calculated value
Assertions for Different Concept Types polarity = -1 negated
Issues: Differences in training data annotation UMLS CUI-driven annotation (SHARPn) UMLE contains some concept-internal negation; concept-internal subject Cigarette smoker Concept: [C0337667] (finding) Never smoked Concept: [C0425293] Never smoked tobacco (finding) Non-smoker Concept: [C0337672] Non-smoker (finding) Mother smokes Concept: [C0424969] (finding) Father smokes Concept: [C0424968] (finding) Mother does not smoke Concept: [C2586137] (finding) Father does not smoke Concept: [C2733448] (finding) i2b2 concept excludes contextual cues; SHARPn concept includes it. The patient has never smoked. i2b2 concept: smoked (negated) SHARPn concept: never smoked (not negated)
Issue: Differences in training data annotation No known allergies Concept: [C0262580] No known allergies i2b2: concept = known allergies; type = problem; assertion = absent SHARPn: concept = no known allergies; type = disease/disorder; (finding in UMLS) assertion = present NKA i2b2: concept = nka ; type= problem; assertion = absent
Abstract We describe a methodology for identifying negation and uncertainty in clinical documents and a system that uses that information to assign assertion values to medical problems mentioned in clinical text. This system was among the top performing systems in the assertion subtask of the 2010 i2b2/VA community evaluation Challenges in natural language processing for clinical data, and has subsequently been packaged as a UIMA module called the MITRE Assertion Status Tool for Interpreting Facts (MASTIF), which can be integrated with cTAKES. We describe the process of extending MASTIF, which uses a single multi-way classifier to select among a closed set of mutually exclusive assertion categories, to a system that uses individual, independent classifiers to assign values to independent negation and uncertainty attributes associated with a variety of clinical concepts (e.g., medications, procedures, and relations) as specified by SHARPn requirements. We discuss the benefits that result from this new representation and the challenges associated with generating it automatically. We compare the accuracy of MASTIF on i2b2 data with accuracy on a subset of SHARPn clinical documents, and discuss the contribution of linguistic features to accuracy and generalizability of the system. Finally, we discuss our plans for future development.