Presented by Chaveevan Pechsiri

Knowledge Discovery and Data Mining to Assist Natural Language Understanding(Adam Wilcox, M.A., George Hripcsak, M.D.Department of Medical Informatics, Columbia University, New York, NY.,1998) Presented by Chaveevan Pechsiri

outline • Objective • Methodologies • Results • Discussion • Suggestion

Objective • Generate queries and rules • Interpret the output from MedLEE processor • at Columbia-Presbyterian Medical Center • Techniques: • NLP • Data mining: Classification by using C5.0 • Chest radiograph reports + clinic encounters

Methodologies • NLP • Findings with modifiers • Generate a vector report • Flattening = finding + modifier • Coding = flattening + modifier value • Classification • The decision tree C5.0(ID3)

NLP congestive heart failure, heart failure, CHF left pleural effusion…… …….. new pleural effusion dictionary Grammar rules Clinical dictionary Words & pharses recognition Std. term generation Classify terms to semantic catagories Parse sequences of semantic categories to structures Findings with modifiers Narrative report MedLEE processor

NLP Processor output (3Findings with modifiers) Narrative report “Probable mild pulmonary vascular congestion with new left pleural effusion, question mild congestive changes Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low NLP MedLEE

Coding finding-modifier pair Pulmonary vascular congestion certainty: high degree: low Pleural effusion region: left status: new Congestive change certainty: moderate degree: low Processor output Finding vector report pulmonary vascular congestion= present pulmonary vascular congestion: certainty= high pulmonary vascular congestion: degree= low pleural effusion= present pleural effusion:region= left pleural effusion:status= new congestive change= present congestive change:certainty= moderate congestive change:degree= low

C5.0Decision table Diagnosing Hypothyroidism AttributeAssay 1Assay 2Assay 3 ..... age 32 63 19 sex F M M on thyroxine t f f query on thyroxine f f f on antithyroid medication f f f sick f f f pregnant t N/A N/A thyroid surgery f f f I131 treatment f f f query hypothyroid f f t query hyperthyroid t f f lithium f f f tumor f f f goitre f f f hypopituitary f f f psych f f f TSH 0.025 108 9 T3 3.7 .4 2.2 TT4 139 14 117 T4U 1.34 .98 - FTI 104 14 - referral source other SVI other diagnosis negative primary compensated hypothyr hypothyr

C5.0 If-then rules Rule 1: (31, lift 42.7) thyroid surgery = f TSH > 6 TT4 <= 37 -> class primary [0.970] Rule 2: (63/6, lift 39.3) TSH > 6 FTI <= 65 -> class primary [0.892] Rule 3: (270/116, lift 10.3) TSH > 6 -> class compensated [0.570] Rule 4: (2225/2, lift 1.1) TSH <= 6 -> class negative [0.999] Rule 5: (296, lift 1.1) on thyroxine = t FTI > 65 -> class negative [0.997]

Error Measurement TP=True Positive FN=False Negative TN=True Negative FP=False Negative

results

Discussion • The automated method did not reach the level of the physicians • High noise in training set • The training set is too small to properly train the system to detect positive findings. • The training set with ICD9 was not accurate enough to create rules • the ambiguities cause C5.0 error, or lack of strong specificity

Suggestion • Need a large training set to generate a sensitive classifier • Ontology should be implemented to clinical dictionary • Need to modify the ICD9 code • The knowledge discovery should be the generalized knowledge • Try some other classifiers: Bayesian belief networks, the Backpropagation neural network, the sequential covering algorithm

Presented by Chaveevan Pechsiri

Presented by Chaveevan Pechsiri

Presentation Transcript

PRESENTED BY

Presented By

Presented by:

PRESENTED BY

Presented by

PRESENTED BY :

Presented By:

Presented by

Presented by

presented by

Presented By:

Presented by:

Presented by

Presented by

Presented By:

Presented By

Presented by:

Presented by:

Presented by

Presented by;

PRESENTED BY