290 likes | 405 Views
Extraction of Adverse Drug Effects from Clinical Records. Our material is Discharge Summary. E. ARAMAKI * Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D., K.WAKI * Ph.D. M.D., K.OHE * Ph.D. M.D.,
E N D
Extraction of Adverse Drug Effects from Clinical Records Our material is Discharge Summary E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D., K.WAKI * Ph.D. M.D., K.OHE * Ph.D. M.D., * University of Tokyo, Japan ** Fuji Xerox, Japan
Background • The use of Electronic Health Records (EHR) in hospitals is increasing rapidly everywhere • They contain much clinical information about a patient’s health BUT Many Natural Language texts ! Extracting clinical information from the reports is difficult because they are written in natural language
NLP based Adverse Effect Detecting System • We are developing a NLP system that extracts medical information, especially Adverse Effect, form natural language parts • INPUT • a medical text (discharge summary) • OUTPUT • Date Time • Medication Event • Adverse Effect Event ≒ i2b2 Medication Challenge But our target focuses only on adverse effect Adverse Effect Relation (AER)
Why Adverse Effect Relations? • Clinical trials usually target only a single drug. • BUT: real patients sometimes take multiple medications, leading to a gap separating the clinical trials and the actual use of drugs • For ensuring patient safety, it is extremely important to capturing a new/unknown AEs in the early stage.
DEMO is available on http://mednlp.jp
副作用関係の推定 System Demo
Cc 副作用関係の推定 System Demo Medication has no complications at the time of diagnosis 6/23-25 FOLFOX6 2nd. 6/24, 25: moderate fever (38℃) again. a fever reducer…. Relation Adverse Effect
The point of This Study • (1) Preliminary Investigation: How much information actually exist? • We annotated adverse effect information in discharge summaries • (2) NLP Challenge: Could the current NLP retrieve them? • We investigated the accuracy of with which the current technique could extract adverse effect information
Outline • Introduction • Preliminary Investigation • How much information actually exist in discharge summary? • NLP Challenge • Conclusions
Material & Method • Material: 3,012 Japanese Discharge Summaries • 3 humans annotated possible adverse effects due to the following 2 steps Step1 Event Annotation <D>Lasix<D> for <S>hypertension</S> is stopped due to <S>his headache</S>. XML tag = Event Step 2 Relation Annotation <D rel=“1”>Lasix<D> for <S>hypertension</S> is stopped due to <S rel=“1”>his headache</S>. XML attribute = Relation
Annotation Policy & Process • We regard only MedDRA/J terms as the events. • We regarded even a suspicion of an adverse effect as positive data. adverse effect terminology • Entire data annotation is time-consuming → We split data into 2 sets SET-A (Event Rich parts): contains keywords such as Stop, Change, Adverse effect, Side effect Full annotated SET-B: The other Randomly sampled & annotated
SET-B SET-A 14.5%×53.5%+85.5%×11.3% = 17.4%
Results of Preliminary Investigation • About 17% discharge summaries contain adverse effect information. • Even considering that the result includes just a suspicion of effects, the summaries are a valuable resource on AE information. • We can say that discharge summaries are suitable resources for our purpose.
Outline • Introduction • Preliminary Investigation • NLP Challenge • Could the current NLP technique retrieve the AEs? • Conclusions
Combination of 2 NLP Steps • 2 NLP steps directly correspond to each annotation step Lasix for hyperpiesia is stopped due to the pain in the head. Event Annotation Medication symptom symptom Relation Annotation Adverse Effect Relation ≒Named Entity Recognition Task = Relation Extraction Task, which is one of the most hot NLP research topics.
Step1: Event Identification • Machine Learning Method • CRF (Conditional Random Field) based Named Entity Recognition • Feature • Lexicon (Stemming), POS, Dictionary based feature (MedDRA), window size=5 • Material • SET-A Corpus with Event Annotations state-of-the-art method at i2b2 de-identification task Standard Feature Set
Step1: Result of Event Identification • Result Summary Cat. of Event Precision Recall F-measure Medication Event 86.99 81.34 0.84 AE Event 85.56 80.24 0.82 • All accuracies (P, R) >> 80 %, F>0.80, demonstratingthe feasibility of our approach • Considering that the corpus size is small (435 summaries), we can say that the event detection is an easy task
Step2: Relation Extraction Method • Basic Approach ≒Protein-Protein Interaction (PPI) task [BioNLP2009-shared Task] • Example • For each m (Medications) • For each a (Adverse Effects) • judge_it_has_rel (a, m) Lasix for hypertension is stopped due to his headache • (1) judge_it_has_AER (Lasix , hypetension) • (2) judge_it_has_AER (Lasix ,headach)
Two judgment methods • (1)PTN-BASED: heuristic rules using a set-of-keyword & word distance ..is on ACTOS but stopped for relief of the edema . <medication> keyword <adverse effect> n=1 n=4 Judge_it_has_AER (m, a, keyword=stopped, windowsize5) • (2) SVM-BASED: Machine learning approach • Feature: distance & words between two events ( medication & adverse effect) See proceedings for detailed
Step2: Result of Relation Extraction Precision Recall F-measure PTN-BASED 41.1% 91.7% 0.650 SVM-BASED 57.6% 62.3% 0.598 • Both PTN & SVM accuracies are low (F<0.65) → the Relation extraction task is difficult! • SVM accuracy is significant (p=0.05) lower than PTN • (1) Corpus size is small (2) positive data << negative data Machine learning suffers from such small imbalanced data
Outline • Introduction • Preliminary Investigation • NLP Challenge • Discussions • (1) Overall Accuracy • (2) Controllable Performance • (3) Event Distribution • Conclusions
Discussion (1/3) Overall Accuracy • The overall accuracy is estimated by the combined accuracies of step1 & step2 Overall (= step1 × step2) Precision 0.289 (=0.855 × 0.869 × 0.390) Recall 0.597 (=0.802 × 0.813 × 0.917) • Each NLP step is not perfect, so, the combination of such imperfect results leads to the low accuracy (especially many false positives; low precision)
Discussion (2/3)Performance is Controllable • The performancebalance between recall & precision could be controlled High precision setting High recall setting That is a strong advantage of NLP Precision & Recall curve in SVM
Discussion (3/3)Event Distribution • We investigated the entire AE frequency for each medication category. AE freq. distribution of Drug #1 distribution acquired from annotated real data distribution acquired from our system results
Discussion (3/3)AER Distribution • Then, we checked the goodness of the fit test, which measures the similarity between two distributions P-value Med. 1 0.023 Med. 2 0.013 Med. 3 0.010 Med. 4 0.006 Med. 5 0.005 Total 0.011 • High p-value (p=0.011> 0.01) indicates • two distributions are similar.
Outline • Introduction • Preliminary Investigation • NLP Challenge • Discussions • Conclusions
Conclusions (1/2) • Preliminary Investigation: • About 17% discharge summaries contain adverse effect information. • We can say that discharge summary are suitable resources for AERs • NLP Challenge: • Could NLP retrieve the AE information? • Difficult! Overall accuracy is low
Conclusions (2/2) • BUT: 2 positive findings: (1) We can control the performance balance (2) Even the accuracy is low, the aggregation of the results is similar to the real distribution • IN THE FUTURE: • A practical system using the above advantages • More acute method for relation extraction
Thank you Contact Info • Eiji ARAMAKI Ph.D. • University of Tokyo • eiji.aramaki@gmail.com • http://mednlp.jp