1 / 14

Presented by Chaveevan Pechsiri

Knowledge Discovery and Data Mining to Assist Natural Language Understanding (Adam Wilcox, M.A., George Hripcsak, M.D. Department of Medical Informatics, Columbia University, New York, NY.,1998). Presented by Chaveevan Pechsiri. outline. Objective Methodologies Results Discussion

Download Presentation

Presented by Chaveevan Pechsiri

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Discovery and Data Mining to Assist Natural Language Understanding(Adam Wilcox, M.A., George Hripcsak, M.D.Department of Medical Informatics, Columbia University, New York, NY.,1998) Presented by Chaveevan Pechsiri

  2. outline • Objective • Methodologies • Results • Discussion • Suggestion

  3. Objective • Generate queries and rules • Interpret the output from MedLEE processor • at Columbia-Presbyterian Medical Center • Techniques: • NLP • Data mining: Classification by using C5.0 • Chest radiograph reports + clinic encounters

  4. Methodologies • NLP • Findings with modifiers • Generate a vector report • Flattening = finding + modifier • Coding = flattening + modifier value • Classification • The decision tree C5.0(ID3)

  5. NLP congestive heart failure, heart failure, CHF left pleural effusion…… …….. new pleural effusion dictionary Grammar rules Clinical dictionary Words & pharses recognition Std. term generation Classify terms to semantic catagories Parse sequences of semantic categories to structures Findings with modifiers Narrative report MedLEE processor

  6. NLP Processor output (3Findings with modifiers) Narrative report “Probable mild pulmonary vascular congestion with new left pleural effusion, question mild congestive changes Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low NLP MedLEE

  7. Coding finding-modifier pair Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low Processor output Finding vector report pulmonary vascular congestion= present pulmonary vascular congestion: certainty= high pulmonary vascular congestion: degree= low pleural effusion= present pleural effusion:region= left pleural effusion:status= new congestive change= present congestive change:certainty= moderate congestive change:degree= low

  8. C5.0Decision table Diagnosing Hypothyroidism AttributeAssay 1Assay 2Assay 3 ..... age 32 63 19 sex F M M on thyroxine t f f query on thyroxine f f f on antithyroid medication f f f sick f f f pregnant t N/A N/A thyroid surgery f f f I131 treatment f f f query hypothyroid f f t query hyperthyroid t f f lithium f f f tumor f f f goitre f f f hypopituitary f f f psych f f f TSH 0.025 108 9 T3 3.7 .4 2.2 TT4 139 14 117 T4U 1.34 .98 - FTI 104 14 - referral source other SVI other diagnosis negative primary compensated hypothyr hypothyr

  9. C5.0 If-then rules Rule 1: (31, lift 42.7) thyroid surgery = f TSH > 6 TT4 <= 37 -> class primary [0.970] Rule 2: (63/6, lift 39.3) TSH > 6 FTI <= 65 -> class primary [0.892] Rule 3: (270/116, lift 10.3) TSH > 6 -> class compensated [0.570] Rule 4: (2225/2, lift 1.1) TSH <= 6 -> class negative [0.999] Rule 5: (296, lift 1.1) on thyroxine = t FTI > 65 -> class negative [0.997]

  10. Error Measurement TP=True Positive FN=False Negative TN=True Negative FP=False Negative

  11. results

  12. results

  13. Discussion • The automated method did not reach the level of the physicians • High noise in training set • The training set is too small to properly train the system to detect positive findings. • The training set with ICD9 was not accurate enough to create rules • the ambiguities cause C5.0 error, or lack of strong specificity

  14. Suggestion • Need a large training set to generate a sensitive classifier • Ontology should be implemented to clinical dictionary • Need to modify the ICD9 code • The knowledge discovery should be the generalized knowledge • Try some other classifiers: Bayesian belief networks, the Backpropagation neural network, the sequential covering algorithm

More Related