1 / 14

Knowledge discovery with classification rules in a cardiovascular dataset

This paper explores the use of the AREX algorithm to discover knowledge and patterns in a cardiovascular dataset, allowing for improved understanding and decision-making in the medical field.

bhibner
Download Presentation

Knowledge discovery with classification rules in a cardiovascular dataset

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge discovery with classification rules in a cardiovascular dataset Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Viii Podgorelec a,*, Peter Kokol a, Milojka Molan Sti81ic b, Marjan Heri :ko a, Ivan Rozrnan a Computer Methods and Programs in Biomedicine, Volume 80, Supplement 1, December 2005, Pages S39-S49

  2. Outline • Motivation • Objective • The AREX algorithm • Experiment • Conclusions

  3. Motivation • Modern medicine generates huge amounts of data and there is an acute and widening gap between data collection and data comprehension. • it is very difficult for a human to make use of such amount of information (i.e. hundreds of attributes, thousand of images, several channels of 24 hours of ECG or EEG signals)

  4. Objective • confirm their existing knowledge about medical problem (ie. hundreds of attributes) • enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge.

  5. The AREX algorithm 2 1 3 1 multi-population self-adapting genetic algorithm for the induction of decision trees. 1.1Build N decision trees upon objects from S Oi 1.2Classify object with nt randomly chosen trees s 1.4From all N decision create M initial classification rules 2 evolution of programs in an arbitrary programming language, which is used to evolve classification N 2.1create m/2+1 rules (randomly) 2.2 S* • 2.3 If s is not empty • Add |s| randomly chosen objects from s* to s • ct=ct+1 • repeat 1.1 1.3if frequency of the most frequent decision class classified by nt trees > nt - ct (ct=nt/2) 3 an optimal set of classification rules is determined with a simple genetic algorithm

  6. root Xi Xi null null Xi null null null null null null Genetic algorithm for the construction of decision trees 1.Number of attribute nodes M that will be in the tree 2. Select an attribute Xi (1)Continuous attributes →split constant (2)Discrete attributes →randomly defined two disjunctive sets M attributes population 3. 選一空節點,(tree深度愈高,選中機率愈低) 4. Randomly select an attribute Xi (還沒被選過的機率較高) • For each empty leaf the following algorithm • determines the appropriate decision class

  7. proGenesys system & Finding the optimal set of rules

  8. Introduction • Advantage • transparency of the classification process that one can easily interpret, understand and criticize. • Disadvantages • poor processing of incomplete, noisy data, • inability to build several trees for the same dataset • inability to use the preferred attributes, etc.

  9. Dataset • contains data of 100 patients from Maribor Hospital. • The attributes include • general data (age, sex, etc.) • health status (data from family history and child's previous illnesses), • general cardiovascular data (blood pressure, pulse, chest pain, etc.) • more specialized cardiovascular data - data from child's cardiac history and clinical examinations (with findings of ultrasound, ECG, etc.). • dataset five different diagnoses are possible: • innocent heart murmur良性雜音 • congenital heart disease with left-to-right shunt先天性心臟病(左向右分流) • aortic valve disease with aorta coarctation,主動脈辨疾病(主動脈縮窄) • arrhythmias心律不整 • chest pain.心悸

  10. Classification result –training set Overfitting

  11. Classification result –testing set

  12. Classification result

  13. Conclusions • One of the most evident advantages of AREX is the simultaneous very good • Generalization → high and similar overall accuracy on both training set and test set • Specialization → high and very similar accuracy of all decision classes, also the least frequent ones. • equip physicians with a powerful technique to • (1) confirm their existing knowledge about some medical problem • (2) enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge.

  14. My opinion • Advantage: 依屬性給予權重 • Disadvantage: • Apply: 台中醫院?

More Related