Multi-label Associative Classification of Medical Documents from MEDLINE

Multi-label Associative Classification of Medical Documents from MEDLINE Advisor : Dr. Hsu Presenter : Chih-Ling Wang Author : Rafal Rak, Lukasz Kurgan, Marek Reformat ACM GECCO 2005

Outline • Motivation • Objection • Proposed approach • Experiments • Conclusion • My opinion

Motivation • Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated with their storage, description and classification.

Objective • This research aims to provide an automated method for classification of articles into the structure of medical document repositories, which would support currently performed extensive manual work.

Proposed approach • MEDLINE and MeSH • Each article reference in MEDKINE database includes information such as a unique identifier, author, title, journal information, abstract, and 10 to 15 manually assigned MeSH keywords. • MeSH is an annually updated controlled vocabulary thesaurus of medical terms. 15 general concepts(keywords) Keyword can occur more than once in a tree

System overview Porter’s algorithm • Assigning MeSH keywords to documents • While pruning is based on one parameter, minimum confidence, ranking and selection can be performed in many ways. • Simple. • Confidence factor. • Cosine factor. • Simple, non-recurrent. • Confidence factor, non-recurrent. Assign keywords User-defined minimum support and minimum confidence Pruning, ranking, and selecting rules

Experimentation • Experimental setup: • OHSUMED: 1987 to 1991 and documents have both title and abstract. • Training set: 183229 documents dated 1987 through 1990. • Testing set: 50216 documents dated 1991. • Quality evaluation: accuracy, precision, recall, F1 contingency matrix • Macro averaging V.S. micro averaging

Conclusion • This paper describes the development of the system for classification of medical article references. • The proposed system is based on associative classification technology. • We used OHSUMED, a subset of the MEDLINE database, as the source of documents and the MeSH tree as class labels. • If the goal is to classify the largest number of documents, one should choose a configuration that maximizes micro F1. • If one wishes our system to work well for categories with small number of documents a configuration that maximizes macro F1 should be chosen.

My opinion • Advantage: consider the whole spectrum of MeSH categories generalized to the second level of the tree. • Disadvantage:… • Apply: many fields.

Multi-label Associative Classification of Medical Documents from MEDLINE

Multi-label Associative Classification of Medical Documents from MEDLINE

Presentation Transcript

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier

An Ensemble-based Approach to Fast Classification of Multi-label Data Streams

Large Scale Multi-Label Classification

Seeking Abbreviations From MEDLINE

Multi-Label Collective Classification

Multi-label Relational Neighbor Classification using Social Context Features

Classification of Commercial and Regulatory Documents

Effective Multi-Label Active Learning for Text Classification

A Lazy Approach to Associative Classification

Multi-Label Feature Selection for Graph Classification

Music Information Retrieval based on multi-label cascade classification system

SUPERVISED CLASSIFICATION OF TEXT DOCUMENTS

Lazy Associative Classification

PARTIALLY SUPERVISED CLASSIFICATION OF TEXT DOCUMENTS

SIMD, Associative, and Multi-Associative Computing

SIMD, Associative, and Multi-Associative Computing

Multi-Label Collective Classification

An Ensemble-based Approach to Fast Classification of Multi-label Data Streams

A k -Nearest Neighbor Based Algorithm for Multi-Label Classification

Classification of Business Documents

Partially Supervised Classification of Text Documents