430 likes | 535 Views
The Problem with Probabilistic Parsing. Kari Baker Arizona State University. What Will We Be Learning Today?. Creating a Model Text Normalization POS Constraints Phrase Constraints Bake-Off Results SNoW Reranker Other. The Task i2b2 Bake-Off Concepts Parsing
E N D
The Problem with Probabilistic Parsing Kari Baker Arizona State University
What Will We Be Learning Today? • Creating a Model • Text Normalization • POS Constraints • Phrase Constraints • Bake-Off Results • SNoW • Reranker • Other • The Task • i2b2 Bake-Off • Concepts • Parsing • Motivation for New Models Parser Models --- Baker
i2b2/VA Challenges in Natural Language Processing for Clinical Data • Three-Part Shared Task • Concepts • Assertion • Relation • Concept Extraction • Problem • Test • Treatment Parser Models --- Baker
Concept Examples Concept Not a Concept Problem: The man was obese. The obese man was admitted. Test: Blood Pressure 130/80 The patient has high blood pressure. Treatment: The patient underwent surgery. The patient arrived in the surgery suite. Parser Models --- Baker
What does a parse look like? S1 S NP VP . DET NN VBD ADJP JJ The man was obese . Parser Models --- Baker
What does a parse look like? (S1 (S (NP (DET The) (NN man)) (VP (VBD was) (ADJP (JJ obese))) (. .))) Parser Models --- Baker
Concept Examples S1 S NP VP . DET NN VBD ADJP JJ The man was obese . Parser Models --- Baker
Concept Examples S1 S NP VP . DET NN VBDADJP JJ The man was obese . Parser Models --- Baker
Concept Examples S1 S NP VP . DET JJ NN AUX VBD The obese man was admitted . Parser Models --- Baker
Concept Examples Problem: (S1 (S (NP (DET The) (NN man)) (VP (VBD was) (ADJP (JJ obese))) (. .))) (S1 (S (NP (DET The) (JJ obese) (NN man)) (VP (AUX was) (VBD admitted))(. .))) Parser Models --- Baker
Concept Examples Test: (S1 (FRAG (NP (NN Blood) (NN Pressure)) (QP (CD 130/80)))) (S1 (S (NP (DET The) (NN patient)) (VP (VB has) (NP (JJ high) (NN blood) (NN pressure))) (. .))) Treatment: (S1 (S (NP (DET The) (NN patient)) (VP (VBD underwent) (NP (NN surgery))) (. .))) (S1 (S (NP (DET The) (NN patient)) (VP (VBD arrived) (PRP (IN in) (NP (DET the) (NN surgery) (NN suite)))) (. .))) Parser Models --- Baker
Sodium 139 , potassium 3.8 , chloride 101 , bicarb 26 , BUN 9 , creatinine 0.7 , glucose 141 , albumin 4.1 , calcium 8.9 , LDH 665 , AST 44 , ALT of 57 , amylase 41 , CK 32 . • 1. Post endoscopic retrograde cholangiopancreatography pancreatitis . • FLANK PAIN URI ? • A/P : 48yo man with h/o HCV , bipolar DO , h/o suicide attempts , a/w overdose of Inderal , Klonopin , Geodon , s/T Jackson stay with intubation for airway protection , with question of L retrocardiac infiltrate , now doing well . • Please note the patient is only Caucasian speaking and information is second hand . • 16) Robituss in AC five to ten milliliters p.o. q.h.s. p.r.n. cough . • Pt has h/o colon can to liver , s/p resxn with serosal implants in 9/03 . • She received ASA , nitro SL then gtt , morphine , metoprolol , and heparin gtt . • 5. Dulcolax 10 to 20 mg PR b.i.d. p.r.n. constipation . • The pt is a 55yo F s / p Roux en Y GBP in 12/20 presenting to the ED this AM c / o mod severe midepigastric pain . • Her electrocardiogram revealed normal sinus rhythm , left atrial enlargement, left axis deviation , poor R-wave progression in V1 through V4 , consistent with marked clockwise rotation , cannot rule out an old anteroseptal wall myocardial infarction . Parser Models --- Baker
The Problem CT scan normal (S1 (S (NP (NNP CT)) (VP (VB scan) (S (ADJP (JJ normal)))))) Parser Models --- Baker Parser Models --- Baker 13
By-Hand Parses 57 Sentences Parsed by hand Necessary to understand structure of sentences Parser Models --- Baker
The Problem • No VP • CT scan normal • Lists • 1. Bactrim double strength • Fragment construction • (S1 (FRAG (NP (NN Blood) (NN Pressure)) (QP (CD 130/80)))) …among others Parser Models --- Baker
How does the Charniak Parser work? • Uses a trained model • Models can be trained on different corpra • WSJ PennTreebank corpus • Defines probabilistic productions • Example:S 99%, fragment 1% Parser Models --- Baker
The Problem *WSJ corpus has 39,832 by-hand Parses Parser Models --- Baker
The Problem CT scan normal Desired Parse: (S1 (FRAG (NP (NN CT) (NN scan)) (ADJP (JJ normal)))) Parser Output: (S1 (S (NP (NNP CT)) (VP(VB scan)(S (ADJP (JJ normal)))))) Parser Models --- Baker Parser Models --- Baker 18
The Problem CT scan normal Desired Parse: (S1 (FRAG (NP (NN CT) (NN scan)) (ADJP (JJ normal)))) Parser Output: (S1 (S (NP (NNP CT)) (VP (VB scan) (S (ADJP (JJ normal)))))) Parser Models --- Baker
The Problem S1 S1 FRAG S NP ADJP NP VP NNP VB S NN NN JJ ADJP JJ CT scannormal CT scan normal Parser Models --- Baker
How are Desirable Parses Obtained? Text Normalization Part of Speech Constraints Phrase Constraints Parser Models --- Baker
Text Normalization Pt 's labs were checked Only minimal exertion such as " walking across the room " The patient is a **AGE[in 50s]- year - old female well until **DATE[Jan 2007] The MRI was performed here at **INSTITUTION she does have a Foley catheter in for I& ; O measurement Parser Models --- Baker
Text Normalization > = > If you experience fever > 100.4 , return to the hospital . If you experience fever > 100.4 , return to the hospital . Parser Models --- Baker
Text Normalization Note: F-Score is taken from the parser output compared against the by-hand parses of the i2b2 data Parser Models --- Baker
Medical Acronyms/Abbreviations Parser Models --- Baker
Constraining with Parts of Speech qn nightly = adverb (S1 (XX He) (XX was) (XX placed) (XX on)(XX Unasyn) (XX 3) (XX grams) (RB qn) (XX .)) He was placed on Unasyn 3 grams qn. Parser Models --- Baker
Constraining with Parts of Speech *Note: There were 5 failed parses for the POS Constraints whereas the Normalized Text had zero. Parser Models --- Baker
Constraining with Phrases Patient has swollen painful L side face . Concept = swollen painful L side face (S1 (XX Patient) (XX has) (NP-problem (XX swollen) (XX painful) (XX L) (XX side) (XX face))(XX .)) Parser Models --- Baker
Constraining with Phrases Parser Models --- Baker
What Next? Train Model! • No True Concepts on Test Day • Treat phrase-constrained parser as truth • Train model on that data Parser Models --- Baker
Phrase-Constrained Model Parser Models --- Baker
Phrase-Constrained Model Parser Models --- Baker
Concept Extraction: SNoW (S1 (S (NP (DET The) (NN patient)) (VP (VBD underwent) (NP (NN surgery))) (. .))) SNoW The patient .99 None .01 Problem .00 Test .00 Treatment surgery .01 None .09 Problem .51 Test .49 Treatment surgery = Test Parser Models --- Baker
Concept Extraction: SNoW Note: These F-Scores are from our predicted concepts compared to the “gold” concepts. Parser Models --- Baker
Concept Extraction: Reranker The patient .99 None .01 Problem .00 Test .00 Treatment surgery .01 None .09 Problem .51 Test .49 Treatment (S1 (S (NP (DET The) (NN patient)) (VP (VBD underwent) (NP (NN surgery))) (. .))) Reranker surgery = Treatment The patient 1. None • surgery • Treatment • Test • Problem Parser Models --- Baker
Concept Extraction: Reranker Parser Models --- Baker
Other Results from i2b2 • Concept • Dependency Parse + External Medical Dictionary • F-Score = 53.8 • Relation • Used Dependency Parses Parser Models --- Baker
Recap • Domain mismatch is bad • Constraining parser decreases domain mismatch • Training new models decreases domain mismatch Parser Models --- Baker
Acknowledgments • Kristy Hollingshead • Brian Roark • Richard Sproat • Margit Bowler • Aaron Cohen • Jianji Yang • Kyle Ambert Parser Models --- Baker
Thank You… • Kristy Hollingshead • Christian Monson • Kevin Burger • Isaac Wallis • The Interns • All OGI Faculty, Staff, and Students Parser Models --- Baker
Questions? Parser Models --- Baker
Hierarchical Phrases There is akinesis / dyskinesis and thinning of the mid to distal inferior septum and the apex. (S (NP (EX There)) (VP (VB is) (NP (NP-problem (NN akinesis)) (CC /) (NP-problem (NN dyskinesis))) (CC and) (NP-problem (NN thinning) (PP (IN of) (NP (DT the) (ADJP (JJ mid) (IN to) (JJ distal)) (JJ inferior) (NN septum))) (CC and) (NP (DT the) (NN apex))))) Parser Models --- Baker
Statistical Evaluations Recall (# correct) / (total) Precision (# correct) / (# predicted) F-Score (2*Recall*Precision) / (Precision + Recall) Parser Models --- Baker