1 / 20

Enhancing Text Classifiers to Identify Disease Aspect Information

Enhancing Text Classifiers to Identify Disease Aspect Information. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Research background Problem definition The proposed approach: IDAI Empirical evaluation Conclusion. Research Background.

brice
Download Presentation

Enhancing Text Classifiers to Identify Disease Aspect Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

  2. Outline • Research background • Problem definition • The proposed approach: IDAI • Empirical evaluation • Conclusion Disease Aspect Classification

  3. Research Background Disease Aspect Classification

  4. Disease Aspect Information (DAI) You have two kidneys ... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and ….Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urine A lump in your abdomen … Pain in your side … Treatment depends on your age, …. It might include surgery, radiation, chemotherapy … • An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect. Disease Aspect Classification

  5. Disease Knowledge Map: An Application of DAI Disease Aspect Classification

  6. Disease aspect information Disease Aspects Classifier diagnosis prevention etiology symptoms treatment Query & Aspect Healthcare professionals & consumers Disease Info. Identification of DAI Medical texts for specific diseases Medical information provider Aspect Info. Verified Info. Cross-disease query Healthcare decision support system Disease Info. Disease Aspect Classification

  7. Problem Definition Disease Aspect Classification

  8. Goals • Modeling the identification of DAI as a text classificationproblem • Disease aspects are predefined categories of interest, not brief descriptions of information needs • Developing a technique to enhance various kinds of text classifiers • Given a medical text, the classifier can be more capable in identifying those texts that talk about aspects of diseases Disease Aspect Classification

  9. Related Work • Text classification (TC) • Weakness: multi-aspect information in a text will incur noises to text classifiers • Segment extraction for topic detection • Weakness: designed for specific descriptions (not for categories) • Passage extraction for TC • Weakness: location and length of the passages that are relevant to a specific category  becoming another problem of TC Disease Aspect Classification

  10. The Proposed Approach: IDAI Disease Aspect Classification

  11. IDAI: Revising Term Frequency (TF) to Improve Classifiers IDAI Underlying Text Classifier Identifying Term-Category Correlation type Classifier Development Training Texts Training Testing TF of terms w.r.t. each category Categories (aspects) Assessing Term Frequencies (TF) Classification A text (d) Disease Aspect Classification

  12. Two Strategies for TF Revision Disease Aspect Classification

  13. Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I) Maxc’c{WindowTF(t,d,c’)}-InconsistencyTF(t,d,c), if t is negatively correlated to c(for Strategy II) • WindowTF(t,d,c) = k(0.5+Pwindow,k), for each occurrence of t at k, Pwindow,k= Distance-based sum of weights of other positively correlated terms in a window at k • InconsistencyTF(t,d,c) = k(Pinconsistency,k), for each occurrence of t at k, Pinconsistency,k=0.5How the text segment before k is dominated by the terms positively correlated to c Disease Aspect Classification

  14. Empirical Evaluation Disease Aspect Classification

  15. Experimental Data • Top-10 fatal diseases and top-20 cancers in Taiwan • Total # of diseases: 28 • Source: Web sites of hospitals, healthcare associations, and department of health in Taiwan • Disease aspects (categories): 5 spects: etiology, diagnosis, treatment, prevention, and symptom. • Splitting the texts into aspects: 4669 texts about individual aspects • Test data: Randomly sampling 10% of the 4669 texts and merging them into test texts of 1 to 5 aspects Disease Aspect Classification

  16. Underlying Classifiers & Experimental Baselines • Underlying classifier • The Support Vector Machine (SVM) classifier • Baseline enhancer • CTFA (Liu, 2010), which employs Strategy I for better TC • CTFA does not consider Strategy II Disease Aspect Classification

  17. Results Disease Aspect Classification

  18. Disease Aspect Classification

  19. Conclusion Disease Aspect Classification

  20. Disease knowledge map (Dmap) • Supporting evidence-based medicine, health education, and healthcare decision support • A key step to build a Dmap: Automatic identification of disease aspect information (DAI) • Identification of DAI as a text classification problem • Term proximity as key information to enhance existing classifiers to classify DAI Disease Aspect Classification

More Related