220 likes | 314 Views
Improving Health Question Classification by Word Location Weights. Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan. Outline. Background Problem definition The proposed approach: WLW Empirical evaluation Conclusion. Background. Categories of Health Questions.
E N D
Improving Health Question Classification by Word Location Weights Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Outline • Background • Problem definition • The proposed approach: WLW • Empirical evaluation • Conclusion
Classification of Health Questions • Why health questions? • Health questions provide both reliable and readable health information • Why classification of health questions? • Given a health question q, retrieve related questions (and their answers)
Goal & Motivation • Goal • Target: Chinese Health Questions (CHQs) • Contribution: Developing a technique WLW (Word Location Weight) that estimates the location weights of words in a CHQ based on their locations • Motivation • Location weights can be used by classifiers (e.g., SVM) to improve the classification • Classifying in-space CHQs (cause, diagnosis, process) • Filtering out-space CHQs (may be whatever)
Basic Idea • Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ • Examples: 如何(how to)克服(deal with)緊張(nervous)的情緒(mood)? process 嬰兒(infant)體溫(body temperature)太低(too low)怎麼辦(how to do)? process
Related Work • Recognition of question types (e.g., when, where) • Weakness: Types Intended categories of CHQs • Classification by parsing • Weakness I: Parsing Chinese is still challenging • Weakness II: CHQs are NOT always well-formed • Classification by pattern matching • Weakness: Difficult to construct the string patterns
Main Challenges (1) Defining the two weights of a location p in a CHQ q
Main Challenges (cont.) (2) Encoding the location weights of a word w into two features for the underlying classifier
Interesting Behaviors of WLW • A word w in a question q has two features • Fvaluefront and Fvaluerear • Applicable to different categories and languages (e.g., English) • When w is far from the front and the rear • Both features reduce to the term frequency (TF) of w • WLW reduces to traditional feature-encoding approach (using TF as the features)
Experimental Design • CHQs were downloaded from a health information provider • 864in-space CHQs • cause (category 1): 313 • diagnosis (category 2): 92 • process (category 3): 459 • 100out-space CHQs • whatever (general description) • Five-fold cross validation
Underlying Classifiers • Underlying classifier • The Support Vector Machine (SVM) classifier
Results: Classification of In-Space CHQs • Evaluation criteria • Micro-averaged F1(MicroF1) • Macro-averaged F1(MacroF1)
Results: Filtering of Out-Space CHQs • Evaluation criteria • Filtering ratio (FR) = # out-space CHQs successfully rejected by all categories / # out-space CHQs • Average number of misclassifications (AM) = # misclassifications for the out-space CHQs / # out-space CHQs
Healthcare consumers often read health information on the Internet • Health questions as the valuable resources for healthcare consumers • Providing both reliable and readable health information • Classification of health questions is basis for the retrieval of related questions • cause, diagnosis, process, whatever • WLW can help SVM to improve the classification of CHQs