440 likes | 979 Views
KMM’2005, Kaiserslautern, Germany April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections Mykola Pechenizkiy , Seppo Puuronen Department of Computer Science University of Jyväskylä Finland Alexey Tsymbal
E N D
KMM’2005, Kaiserslautern, Germany April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections Mykola Pechenizkiy, Seppo PuuronenDepartment of Computer ScienceUniversity of Jyväskylä Finland Alexey Tsymbal Department of Computer ScienceTrinity College DublinIreland Michael Shifrin, Irina AlexandrovaN.N.Burdenko Institute of Neurosurgery Russian Academy of Medical Sciences, Moscow, Russia
Contents • Introduction: • Antibiotic Resistance in Nosocomial Infections • Knowledge Discovery in Databases • Data Collection and Organization, Dataset’s characteristics • Experimental results in this paper (pilot studies) • Association and classification rules, • Classifiers • Experimental results of our further studies (up-to-date) • Many-sided analysis • Basic classifiers • Feature selection • Clustering • Local Dimensionality reduction within natural clusters: feature selection (FS) and feature extraction (FE) • Conventional PCA and class-conditional FE • Sequential FS • Tracking Concept Drift • 3 evaluation strategy • Dynamic integration of classifiers • Conclusions and future work KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Antibiotic Resistance in Nosocomial Infections • 3 - 40% of patients admitted to hospital acquire an infection during their stay, and that the risk for hospital-acquired infection, or nosocomial infection, has risen steadily in recent decades. • The frequency depends mostly on the type of conducted operation being greater for “dirty” operations (10-40%), and smaller for “pure” operations (3-7%). E.g. such serious infectious complication as postoperative meningitis is often the result of nosocomial infection. • Antibiotics are the drugs that are commonly used to fight against infections caused by bacteria. • According to the Center for Disease Control and Prevention (CDC) statistics, more than 70% of the bacteria that cause hospital-acquired infections are resistant to at least one of the antibiotics most commonly used to treat infections. • Analysis of the microbiological data included in antibiograms collected in different institutions over different periods of time is considered as one of the most important activities to restrain the spreading of antibiotic resistance and to avoid the negative consequences of this phenomenon. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
How antibiotics work • Inhibition of nucleic acid synthesis • Rifampicin; Chloroquine • Inhibition of protein synthesis • Tetracyclines; Chloramphenicol • Action on cell membrane • Polyenes; Polymyxin • Interference with enzyme system • Sulphamethoxazole • Action on cell wall • Penicillin; Vancomycin • penicillin works by blocking the formation of peptide bonds in the bacterial cell wall and thereby weakens it, leaving the bacterium susceptible to osmotic lysis KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Antibiotic sensitivity of different bacteria • Comparing the antibiotic sensitivity of different bacteria © Jim Deacon, Institute of Cell and Molecular Biology, The University of Edinburgh KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
The emergence of antibiotic resistance Effects of different antibiotics on growth of a Bacillus strain. The right-hand image shows a close-up of the novobiocin disk (marked by an arrow on the whole plate). In this case some individual mutant cells in the bacterial population were resistant to the antibiotic and have given rise to small colonies in the zone of inhibition. © Jim Deacon, Institute of Cell and Molecular Biology, The University of Edinburgh KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
How Antibiotic Resistance Happens KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
How Antibiotic Resistance Happens • In spontaneous DNA mutation, bacterial DNA may mutate spontaneously. Drug-resistant tuberculosis arises this way. • In a form of microbial sex called transformation, one bacterium may take up DNA from another bacterium. Pencillin-resistant gonorrhea results from transformation. • Resistance acquired from a small circle of DNA called a plasmid, that can flit from one type of bacterium to another. • A single plasmid can provide a slew of different resistances. • In 1968, 12,500 people in Guatemala died in an epidemic of Shigella diarrhea. The microbe harbored a plasmid carrying resistances to four antibiotics! KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
How Antibiotic Resistance Happens • Horizontal Gene Transfer(© Grace Yim and Fan Sozzi) KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Mechanisms of Antibiotic Resistance © Grace Yim and Fan Sozzi KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Mechanisms of Antibiotic Resistance KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Problem Formulation • More global, e.g. for pharmaceutical companies • Maintain a pool of effective drugs on the market • Research, develop and test new antimicrobials • Widespread misuse of antibiotics • More local, e.g. for a hospital • Maintain a pool of effective drugs in the hospital • Monitoring and researching … (concept drift, seasons) • Predicts the sensitivity of certain antibiotic for a certain patient with a certain disease • Various intelligent techniques including KM, KDD and DM, ML, DSS etc. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Knowledge Discovery in Databases • Knowledge discovery in databases (KDD) is a combination of data warehousing, decision support, and data mining that indicates an innovative approach to information and knowledge management. • KDD is an emerging area that considers the process of finding previously unknown and potentially interesting patterns and relations in large databases. • We apply KDD techniques to the selected part of real clinical database trying to evaluate possibilities to reveal some interesting patterns of antibiotic resistance. patterns of antibiotic resistance. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
The Knowledge Management Process KDD KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Given n training instances (xi, yi) where xi are values of attributes and y is class Goal: given new x0, predict class y0 The Task of Classification J classes, n training observations, p features Training Set New instance to be classified CLASSIFICATION Examples:- prognostics of recurrence of breast cancer; - diagnosis of thyroid diseases; - Antibiotic Resistance prediction Class Membership of the new instance KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
The Task of Classification • Predicting Antibiotic Resistance • predict the sensitivity of a pathogen to an antibiotic based on data about the antibiotic, the isolated pathogen, and the demographic and clinical features of the patient. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Data Collection • N.N. Burdenko Institute of Neurosurgery • Bacterial analyzer “Vitek-60” (developed by “bioMérieux”) • Information Systems • "Microbiologist" (developed by the Medical Informatics Lab of the institute) • "Microbe" (developed by Russian company "MedProject-3"). • Each instance of the data used in analysis represents one sensitivity test and contains the following features: • pathogen that is isolated during the bacterial identification analysis, • antibiotic that is used in the sensitivity test • the result of the sensitivity test itself (sensitive, resistant or intermediate), obtained from “Vitek” according to the guidelines of (NCCLS). KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Data Organization • The information about sensitivity analysis is connected with patient, his or her demographical data (sex, age) and hospitalization in the Institute (main department, days spent in ICU, days spent in the hospital before test, etc.). • Each instance of microbiological test in the database corresponds to a single specimen (liquor). • Piloting exploratory analysis – 1423 sensitivity tests including the meningitis cases of the year 2002. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Dataset’s characteristics KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Grouping of Pathogens KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Grouping of Antibiotics KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Results of Pilot Studies • On the whole set of features nonparametic approaches like 3-Nearest Neighbor (3NN) classifier resulted in better accuracy in comparison with parametric approaches like Naïve Bayes. • Classes with instances related to sensitive and resistant cases of pathogens are balanced (47% and 48% correspondingly) and easier to predict. On the contrary, there were very few instances of sensitivity tests where the pathogens sensitivity was intermediate (5%), and it was difficult for classifiers to make good predictions for this group of instances. Some algorithms treated instances related to I sensitivity as noise • Naïve Bayes could achieve much higher accuracy when FS is undertaken and the classification model is build on the selected subset of features. • Feature ranking according to the relief measure shows that most of information is concentrated in the features related to antibiotics, much less information in the features that describe pathogen and even less information is in the features that describe demographics of the patients and the hospitalization context. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Examples of Classification Rules 1: (7.2 < years_old <= 14.4) & (main_dept = 1) => pat_ab_sens = S (81/24) 2: (days_fefore_test < 16) & (main_dept = 2) => pat_ab_sens = S (47/7) 3: (pathogen_name = p_aeruginosa) & (recurring = FALSE) & (sex = M) & (days_in_ICU < 21) => pat_ab_sens = S (82/14) 4: (antibiotic_name = vancomycin) => pat_ab_sens = S (44/1) 5: (antibiotic_name = tic_clav) & (pathogen_name = a_calc_baumannii) => pat_ab_sens = S (6/0) The numbers in brackets denote the number of instances satisfying to the left part of the rule (support) and the number of exceptions found for this rule KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Experimental Results of Further Studies • Many-sided data analyses • Basic Classifiers • Naïve Bayes (NB), Bayesian Network (BN), • Three nearest neighbor classifiers (1NN, 3NN, and 15NN), • Decision tree classifier (C4.5). • Rule-based classifier Jrip • Dimensionality reduction (local and global) • Feature Selection • Sequential search strategies • FFS, BFE. BiS • Feature Extraction • PCA, class-conditional parametric and nonparametric • Clustering • Natural clustering • Classical techniques like kMeans, EM • Tracking Concept Drift KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Classification Accuracies 1 of (2) - 4430 sensitivity tests - meningitis cases Jan 2002 - Jul 2004 KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Classification Accuracies 2 of (2) - 4430 sensitivity tests - meningitis cases Jan 2002 - Jul 2004 KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Problem Representation High vs. low quality representation spaces (RS) for concept learning (Michalski, 1995) KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Feature selection or transformation • Features are often correlated (not independent from each other) • Feature selection techniques that just assign weights to individual features are insensitive to interacted or correlated features. • That is why the transformation of the given representation before weighting the features is often preferable. • Data is often not homogenous • For some problems a feature subset may be useful in one part of the instance space, and at the same time it may be useless or even misleading in another part of it. • Therefore, it may be difficult or even impossible to remove irrelevant and/or redundant features from a data set and leave only useful ones by means of feature selection. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Feature extraction process and eigenvalue-based approaches • Feature extraction (FE) is a dimensionality reduction technique that extracts a subset of new features from the original set by means of some functional mapping keeping as much information in the data as possible (Fukunaga 1990). • Conventional Principal Component Analysis (PCA) is one of the most commonly used feature extraction techniques, that is based on extracting the axes on which the data shows the highest variability (Jolliffe 1986). PCA has the following properties: (1) it maximizes the variance of the extracted features; (2) the extracted features are uncorrelated; (3) it finds the best linear approximation in the mean-squares sense; (4) it maximizes the information contained in the extracted features. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
The main problem of PCA in classification PCA gives high weights to features with higher variabilities disregarding whether they are useful for classification or not. PCA for classification: a) effective work of PCA, b) the case where an irrelevant principal component was chosen from the classification point of view. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Data Heterogeneity • A feature subset may be useful in one part of the instance space, and at the same time it may be useless or even misleading in another part • Search for homogenious regions • Different clustering/partitioning techniques • kMeans, EM • Natural clusters • use of contextual features for splitting • features that are not useful for classification by themselves but are useful in combination with other (context-sensitive) features KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Classification within natuaral clusters - 4430 sensitivity tests - meningitis cases Jan 2002 – Jul 2004 • Classification accuracies for two main pathogen clusters KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Local DR within Natural Clusters Comparison of local vs. global 7-NN accuracy results for the a) whole data, b) ‘gram+‘ cluster and c) ‘gram–’ cluster with and without applying FE (top part) and FS (bottom part). (accepted to IEEE CMBS 2005) KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Antibiotic Resistance as Concept Drift • A difficult problem with learning in many real-world domains is that the concept of interest may depend on some hidden context, not given explicitly in the form of predictive features. • Changes in the hidden context can induce more or less radical changes in the target concept, which is generally known as concept drift • Even in most strictly controlled environments some unexpected changes may happen • due to fail and replacement of some medical equipment, or • due to changes in personnel, causing the necessity to change the model • An effective learner should be able to track such changes and to quickly adapt to them KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Types of Concept Drift (CD) • Changes in hidden context may be a cause of • a change of target concept • a change of the underlying data distribution. • The necessity in the change of current model due to the change of data distribution is called virtual concept drift. • Virtual concept drift and real concept drift often occur together. • Real or virtual, or both – model needs to be changed. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Tracking Concept Drift • 3 strategies were used • Train on every but the 1st chunk and test on the last chunk • Train on i-th chunk and test on the i+1 chunk • Train on the 1st chunk and test on every but the 1st chunk KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Approaches to Handling CD • (1) instance selection; • select instances relevant to the current concept • generalizing from a window that moves over recently arrived instances and uses the learnt concepts for prediction only in the immediate future • (2) instance weighting; • according to their “age”, and their competence with regard to the current concept • instance weighting techniques handle CD worse than analogous instance selection techniques due to overfitting the data • (3) ensemble learning • next slide KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Ensembles & Dyn. Integr. of Classifiers • Ensemble learning is among the most popular and effective approaches to handle concept drift, in which a set of concept descriptions built over different time intervals is maintained, predictions of which are combined using a form of voting, or the most relevant description is selected. • The problem with current ensemble approaches in that they are not able to deal with local concept drift, which is a common case. • only particular bacteria may develop their resistance to certain antibiotics, while resistance to the others can remain the same; • or data distribution can change for particular bacteria only. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Dynamic Intergation of Classifiers (DIC) • Basic idea of DIC techniques: • two main phases. • the learning phase • local classification errors of each base classifier for each instance of the training set are estimated according to the 1/0 loss function using CV. • the learning phase finishes with training the base classifiers on the whole training set. • the application phase • begins with determining k-nearest neighbours for a new instance. • WNN regression is used to predict the local classification errors of each base classifier for the new instance. • Dynamic Selection (DS), • a classifier with the least predicted local classification error is selected • Dynamic Voting (DV), • each base classifier receives a weight proportional to its estimated local accuracy, and the final classification is produced as in WV • Dynamic Voting with Selection (DVS) • the base classifiers with the highest local classification errors are discarded (errors that fall into the upper half of the error interval) and DV is applied to the remaining classifiers KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
DIC for Tracking CD Classification accuracy over sequential data blocks (ensembles of C4.5 decision trees) Dynamic integration techniques improve ensemble accuracy by more than 10% on average. KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Conclusions and Future Work • Contribution • Many-sided analyses of microbiological data • A number of KD techniques are applied • Locally and globally • Data analysis as static DB content and as a stream • Further work • Communicating the results to the experts • Identifying other potential cases for application (hopefully one from CBMS’05) and applying many-sided analyses • comparing found dependecies for other contexts (hospitals, countries, sources of pathogens etc) KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
IEEE CBMS 2005 Trinity College Dublin June 23-24 The 18th IEEE Symposium on Computer-Based Medical Systems http://conferences.computer.org/CBMS2005/index.html KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections
Contact info Mykola Pechenizkiy, Seppo PuuronenDepartment of Computer ScienceUniversity of Jyväskylä Finland mpechen@cs.jyu.fi & sepi@cs.jyu.fi Alexey Tsymbal Dept of Computer ScienceTrinity College DublinIrelandAlexey.Tsymbal@cs.tcd.ie Michael Shifrin, Irina AlexandrovaN.N.Burdenko Institute of Neurosurgery Russian Academy of Medical Sciences, Moscow, Russia Shifrin@nsi.ru KMM’2005 Kaiserslautern, Germany, April 10-13, 2005 Knowledge Discovery in Microbiology Data: Analysis of Antibiotic Resistance in Nosocomial Infections