200 likes | 278 Views
Enhancing Text Classifiers to Identify Disease Aspect Information. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Research background Problem definition The proposed approach: IDAI Empirical evaluation Conclusion. Research Background.
E N D
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Outline • Research background • Problem definition • The proposed approach: IDAI • Empirical evaluation • Conclusion Disease Aspect Classification
Research Background Disease Aspect Classification
Disease Aspect Information (DAI) You have two kidneys ... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and ….Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urine A lump in your abdomen … Pain in your side … Treatment depends on your age, …. It might include surgery, radiation, chemotherapy … • An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect. Disease Aspect Classification
Disease Knowledge Map: An Application of DAI Disease Aspect Classification
Disease aspect information Disease Aspects Classifier diagnosis prevention etiology symptoms treatment Query & Aspect Healthcare professionals & consumers Disease Info. Identification of DAI Medical texts for specific diseases Medical information provider Aspect Info. Verified Info. Cross-disease query Healthcare decision support system Disease Info. Disease Aspect Classification
Problem Definition Disease Aspect Classification
Goals • Modeling the identification of DAI as a text classificationproblem • Disease aspects are predefined categories of interest, not brief descriptions of information needs • Developing a technique to enhance various kinds of text classifiers • Given a medical text, the classifier can be more capable in identifying those texts that talk about aspects of diseases Disease Aspect Classification
Related Work • Text classification (TC) • Weakness: multi-aspect information in a text will incur noises to text classifiers • Segment extraction for topic detection • Weakness: designed for specific descriptions (not for categories) • Passage extraction for TC • Weakness: location and length of the passages that are relevant to a specific category becoming another problem of TC Disease Aspect Classification
The Proposed Approach: IDAI Disease Aspect Classification
IDAI: Revising Term Frequency (TF) to Improve Classifiers IDAI Underlying Text Classifier Identifying Term-Category Correlation type Classifier Development Training Texts Training Testing TF of terms w.r.t. each category Categories (aspects) Assessing Term Frequencies (TF) Classification A text (d) Disease Aspect Classification
Two Strategies for TF Revision Disease Aspect Classification
Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I) Maxc’c{WindowTF(t,d,c’)}-InconsistencyTF(t,d,c), if t is negatively correlated to c(for Strategy II) • WindowTF(t,d,c) = k(0.5+Pwindow,k), for each occurrence of t at k, Pwindow,k= Distance-based sum of weights of other positively correlated terms in a window at k • InconsistencyTF(t,d,c) = k(Pinconsistency,k), for each occurrence of t at k, Pinconsistency,k=0.5How the text segment before k is dominated by the terms positively correlated to c Disease Aspect Classification
Empirical Evaluation Disease Aspect Classification
Experimental Data • Top-10 fatal diseases and top-20 cancers in Taiwan • Total # of diseases: 28 • Source: Web sites of hospitals, healthcare associations, and department of health in Taiwan • Disease aspects (categories): 5 spects: etiology, diagnosis, treatment, prevention, and symptom. • Splitting the texts into aspects: 4669 texts about individual aspects • Test data: Randomly sampling 10% of the 4669 texts and merging them into test texts of 1 to 5 aspects Disease Aspect Classification
Underlying Classifiers & Experimental Baselines • Underlying classifier • The Support Vector Machine (SVM) classifier • Baseline enhancer • CTFA (Liu, 2010), which employs Strategy I for better TC • CTFA does not consider Strategy II Disease Aspect Classification
Results Disease Aspect Classification
Conclusion Disease Aspect Classification
Disease knowledge map (Dmap) • Supporting evidence-based medicine, health education, and healthcare decision support • A key step to build a Dmap: Automatic identification of disease aspect information (DAI) • Identification of DAI as a text classification problem • Term proximity as key information to enhance existing classifiers to classify DAI Disease Aspect Classification