190 likes | 343 Views
Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features. Alan Jovic, Nikola Bogunovic Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia. Contents. Problem description Methods Feature extraction
E N D
Random Forest-Based Classification of Heart Rate Variability Signals by UsingCombinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Contents • Problem description • Methods • Feature extraction • Classification and evaluation • HRV records • Results • Discussion • Conclusion
Problem description • Heart rate variability (HRV) analysis examinesfluctuations in the sequence of cardiac interbeat (RR)intervals • Each cardiac rhythm has a pattern (regular or irregular) in these RR interval fluctuations • HRV is a strong predictor of arrhythmic mortality
Problem description • Some rhythms have very similar patterns of HRV, e.g. normal and (accelerated) junctional rhythm • Some patterns cannot be efficiently detected by HRV analysis alone, e.g. bundle branch blocks, differentiating atrial disorders (atrial fibrillation vs. wandering atrial pacemaker) • Many rhythms and anomalies can be automatically detected and classified using HRV analysis alone • The main question are: • How accurately can a rhythm be classified? • What should the optimal length of the analyzed segment be? • Which features should be used for which type of rhythm?
Motivation • Nonlinear phenomena are involved in genesis of HRV: • Lots of nonlinear features described in literature, very few comparisons of different features’ combinations on the same dataset • The aim of this work is to evaluate a number of different combinations of (sometimes interrelated) linear and nonlinear HRV features in classification of several types of cardiac rhythms
Feature extraction • We consider that a feature is linear if it is unable to take into account the nonlinear dynamics of the HRV signal • Examples of linear features include: • Time domain statistical and geometric measures • Frequency domain spectral features • Nonlinear features try to encompass and quantify the observed complexity of the HRV signal changes • Most of the employed nonlinear features make no assumptions on whether the changes are deterministic or stochastic in origin • Some of the features are specifically designed for HRV analysis, others have more broad areas of application
Feature extraction • Most of the linear and nonlinear features were implemented in our own framework for HRV called ECG Chaos Extractor • The only exceptions were frequency domain features, which were extracted in Matlab using the autoregressive function • Some newly proposed nonlinear features • ASTA, Carnap 1D (tessellation) entropy – both methods require more elaborate further research • Not all of the nonlinear HRV features covered in literature were inspected (e.g. Lyapunov exponents, spectral entropy...)
Classification and evaluation • For the best results on a large number of features, a strong classification algorithm is required • We opted for Random Forests (RF), an ensemble of random decision trees developed by Breiman in 2001 • Internal mechanism for feature selection makes it a valuable tool in the case of a large number of potentially insignificant features • We have also tried other classifiers: ANN, SVM, and C4.5 decision tree, however none of the algorithms gives better results in terms of accuracy and speed
DATABASES FEATURE EXTRACTION FEATURE SELECTION AND CLASSIFICATION RESULTS ECG Chaos Extractor HRV annotations records Weka Classification accuracy Matlab Classification and evaluation • RF was constructed with 40 trees for each feature scheme • Stratified 10x10-fold cross-validation evaluation procedure was executed • Analysis overview:
HRV records • Four types of cardiac rhythms (seven databases) • 500 RR intervals analyzed, with overlapping • A total of 2216 feature vectors
Discussion • Good performance was achieved with schemes: 4, 10, 3, and 9 • The most promising combination is the one in scheme 4 consisting of the following features: • SDNN, pNN20, pNN50, RMSSD, HTI, PSD, VLF, LF, HF, LF/HF, SD1/SD2 ratio, Fano factor, Allan factor • High increase in the number of nonlinear features did not significantly improve classification accuracy • For further research, each segment should be labeled based on the beats or rhythm it contains, and not based on database it originated from • Improvement in accuracy is to be expected • Drawback is the time needed for careful labeling of the rhythms • Additional research is required for useful applicability of certain methods (ASTA, Carnap entropy)
Conclusion • Results suggest high efficiency of linear features in the classification problems • Some of the nonlinear features contribute to greater accuracy of the models • Random forest proved valuable for: • Finding the most relevant subset of features • Efficient classification of different cardiac rhythms • The authors recommend a combination of several time domain, frequency domain and nonlinear features for the best results on medium-sized HRV segments
Thank you! • Questions?