Multiple hierarchical classification of free-text clinical guidelines.

Multiple hierarchical classification of free-text clinical guidelines. Advisor : Dr. Hsu Presenter : Yu-San Hsieh Author : Robert Moskovitch, Shiva Cohen-Kashi Uzi Dror, Iftah Levy, Iftah Levy, Amit Maimon, Yuval Shahar 2006. AIM.177-190

Outline • Motivation • Objective • Method • Evaluation • Results • Discussion • Conclusions

Motivation • Manual classification of free-text documents within a predefine hierarchy, commonly required in the medical domain, is highly time consuming tasking.

Objective • We present an approach to automate the classification of clinical guideline into predefine hierarchical conceptual categories.

Text representation and preprocessing Guideline documents were represented using the vector space model Finding document term Removed stop word Each term is stemmed to its root ,using the Porter Stemming Algorithm Feature selection The term are sorted according to the frequency value. Mutual information (MI) Feature with a low MI value are removed. Hierarchical Mutual Information (HMI)Deal with a hierarchical conceptual structure. ․Each document Di and t different index terms Method dij represent the term frequency term (t) class (c)

The training method Bottom-up training Starting from the hierarchical concept leaves Resulting centroid of concept Top-down training Starting with the root of the concept hierarchy Calculation concept centroid The classification method = <tf1…tfk> (concept represent) the frequency of one of the k term Method ․Calculated for each concept node The CPG from the training set is classified by concept 9 and 11,in Bottom-up approach Concept 2計算二次分類到concept C的所有文件 Concept (C) document (d) Standard deviation Start 分類到concept 9的文件個數 concept 5,2,1的向量 Constant k K↑, threshold↓ error rate↑ 0.5 被concept 5,2,1所分類到的文件個數 The test does evaluate as false 1.Best Fit 2.Parent Fit Current concept centroid End New training CPG vector Iteration nth

The evaluation method Hierarchical precision and recall measure Coverage The similarity between these two sub-tree Micro-averaging The precision and recall of the entire document test collection are average.. Macro-averaging Precision and recall are evaluated for each concept class and then averaged. Method (A) Actual classification path (C) Classified path (B) Intersection between (A) and (C) (D) Unification of (A) and (C) Hierarchical precision = |B| / |A| = 4/5 Hierarchical recall = |B| / |C| = 4 / 7 Coverage = |B| / |D| = 4 / 8

Evaluation • Data set • NGC(美國國家臨床診療指引交換中心 )CPGs(臨床診療指引)collection • The training set: 1136 guidelinesThe test set: 1038 CPGs • Tree-like structure, average depth is around 4-6 depth • The folding concept trees, as tuberculosis into the higher level concept of an infectious pulmonary disease.

Determining the feature selection method HMI versus MI Results

Training method Stop-criteria methods Results Not require a precise match of a set of particular concepts higher Better result

The influence of the classification threshold The impact of tree-folding Results Optimal value of k is 0.1 high ◎Consider that folding might cause deviation [0.1,∞] Below and above this range the precision is very low

Discussion • The focus of this study on the hierarchical-classification and evaluation method. • Feature selection • HMI measure proved to be a more effective feature selection method compare to the other measures. • Training method • The top-down outperformed the bottom-up. • Classification phase • The best-fit outperformed the parent-fit. • The precision was higher when a lower number of standard deviation was set.

Conclusions • When considering the low ratio of guidelines to classification concepts in the evaluation data set used here.

My opinion • Advantage: … • Disadvantage:… • Apply：Search engine

Multiple hierarchical classification of free-text clinical guidelines.

Multiple hierarchical classification of free-text clinical guidelines.

Presentation Transcript

Clinical Guidelines

Automatic Text Classification

Hierarchical Multiple Regression

Integration of multiple sources of evidence in clinical classification of VUS

Hierarchical Multiple Regression

Text Classification

Classification of clinical trials

free text with no agreed guidelines text guidelines template proformas

Clinical Guidelines

TEXT CLASSIFICATION

Text Classification

Text Classification

Text Classification

Text Classification

Clinical Guidelines

Text Classification

Classification of clinical trials

Text Classification

Classification Text

Hierarchical Structure of the Classification of Organisms

Text Classification

TEXT CLASSIFICATION