1 / 9

Multi-label Associative Classification of Medical Documents from MEDLINE

Multi-label Associative Classification of Medical Documents from MEDLINE. Advisor : Dr. Hsu Presenter : Chih-Ling Wang Author : Rafal Rak, Lukasz Kurgan, Marek Reformat. ACM GECCO 2005. Outline. Motivation Objection

charitaj
Download Presentation

Multi-label Associative Classification of Medical Documents from MEDLINE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-label Associative Classification of Medical Documents from MEDLINE Advisor : Dr. Hsu Presenter : Chih-Ling Wang Author : Rafal Rak, Lukasz Kurgan, Marek Reformat ACM GECCO 2005

  2. Outline • Motivation • Objection • Proposed approach • Experiments • Conclusion • My opinion

  3. Motivation • Ability to provide convenient access to scientific documents becomes a difficult problem due to large and constantly increasing number of incoming documents and extensive manual work associated with their storage, description and classification.

  4. Objective • This research aims to provide an automated method for classification of articles into the structure of medical document repositories, which would support currently performed extensive manual work.

  5. Proposed approach • MEDLINE and MeSH • Each article reference in MEDKINE database includes information such as a unique identifier, author, title, journal information, abstract, and 10 to 15 manually assigned MeSH keywords. • MeSH is an annually updated controlled vocabulary thesaurus of medical terms. 15 general concepts(keywords) Keyword can occur more than once in a tree

  6. System overview Porter’s algorithm • Assigning MeSH keywords to documents • While pruning is based on one parameter, minimum confidence, ranking and selection can be performed in many ways. • Simple. • Confidence factor. • Cosine factor. • Simple, non-recurrent. • Confidence factor, non-recurrent. Assign keywords User-defined minimum support and minimum confidence Pruning, ranking, and selecting rules

  7. Experimentation • Experimental setup: • OHSUMED: 1987 to 1991 and documents have both title and abstract. • Training set: 183229 documents dated 1987 through 1990. • Testing set: 50216 documents dated 1991. • Quality evaluation: accuracy, precision, recall, F1 contingency matrix • Macro averaging V.S. micro averaging

  8. Conclusion • This paper describes the development of the system for classification of medical article references. • The proposed system is based on associative classification technology. • We used OHSUMED, a subset of the MEDLINE database, as the source of documents and the MeSH tree as class labels. • If the goal is to classify the largest number of documents, one should choose a configuration that maximizes micro F1. • If one wishes our system to work well for categories with small number of documents a configuration that maximizes macro F1 should be chosen.

  9. My opinion • Advantage: consider the whole spectrum of MeSH categories generalized to the second level of the tree. • Disadvantage:… • Apply: many fields.

More Related