280 likes | 378 Views
Datamining @ ARTreat. Veljko Milutinović vm@etf.rs Zoran Babović zbabovic@gmail.com Nenad Korolija nenadko@gmail.com Goran Rakočević g.rakocevic@gmail.com Marko Novaković atisha34@yahoo.com. Agenda.
E N D
Datamining @ ARTreat Veljko Milutinović vm@etf.rsZoran Babović zbabovic@gmail.comNenad Korolija nenadko@gmail.comGoran Rakočević g.rakocevic@gmail.comMarko Novaković atisha34@yahoo.com
Agenda • ARTReat – the project • Arteriosclerosis – the basics • Plaque classification • Hemodynamic analysis • Data mining for the hemodynamic problem • Data mining from patent records
ARTreat – the project • ARTreat targets at providing a patient-specific computational modelof the cardiovascular system, used to improve the quality of predictionfor the atherosclerosis progression and propagation into life-threatening events. • FP7 Large-scale Integrating Project (IP) • 16 partners • Funding: 10,000,000 €
Atherosclerosis • Atherosclerosis is the condition in which anartery wall thickens as the result of a build-up of fatty materials such as cholesterol
Artheriosclerotic plaque • Begins as a fatty streak, an ill-defined yellow lesion–fatty plaque, develops edges that evolve to fibrous plaques, whitish lesions with a grumous lipid-rich core
Plaque components • Fibrous, Lipid, Calcified, Intra-plaque Hemorrhage
Plaque classification • Different types of plaque pose different risks • Manual plaque classification (done by doctors)is a difficult task, and is error prone • Idea: develop an AI algorithmto distinguish between different types of plaque • Visual data mining
Plaque classification (2) • Developed by Foundation for Research and Technology • Based on Support Vector Machines • Looks at images produced by IVUS and MRIand are hand labeled by physicians • Up to 90% accurate
Data mining task in Belgrade • Two separate paths: • Data mining from the results of hemodynamic simulations • Data mining form medical patient records • Goal: to provide input regarding the progression of the diseaseto be used for medical decision support
Hemodynamics – the basics • Study of the flow of blood through the blood vessels • Maximum Wall Shear Stress – an important parameterfor plaque development prognoses
Hemodynamics - CFD • Classical methods for hemodynamic calculations employ Computer Fluid Dynamics (CFD) methods • Involves solving the Navier-Stokes equation: • …but involves solving it millions of times! • One simulation can take weeks
Data mining form hemodynamic simulations (first path) • Idea: use results of previously done simulations • Train a data mining AI system capable of regression analysis • Use the system to estimate the desired valuesin a much shorter time
Neural Networks - background • Systems that are inspired by the principle of operationof biological neural systems (brain)
Neural Networks – the basics • A parallel, distributed information processing structure • Each processing element has a single output which branches (“fans out”) into as many collateral connections as desired • One input, one output and one or more hidden layers
Artificial neurons • Each node (neuron) consists of two segments: • Integration function • Activation function • Common activation function • Sigmoid
Neural Networks - backpropagation • A training method for neural networks • Try to minimize the error function:by adjusting the weights • Gradient descent: • Calculate the “blame” of each input for the output error • Adjust the weights by:(γ- the learning rate)
Input data set • Carotid artery • 11 geometric parameters and the MWSS value
The model • One hidden layer • Input layer: linear • Hidden and output: sigmoid • Learning rate 0.6 • 500K training cycles • Decay and momentum
Current results • Average error: 8.6% • Maximum error 16,9%
The “dreaded” line 4 • Line 4 of the original test set proved difficult to predict • Error was over 30% • Turned out to be an outlier • Combination of parameters was such that it couldn’t • But the CFD worked, NN worked • Visually the geometry looked fine • Goes to show how challenging the data preprocessing can be
Dataset analysis • Two distinct areas of MWSS values: • the subset with lower values of MWSS, where a similar clear pattern can be seen against all of the input variables, • scattered cloud of values in the subset with higher MWSS values. • Histogram shows the majority of values grouped in the lower half of the values in the set, with only a small number of points in the higher half.
MWSS value prediction • Two approaches: • Single model • Two models: • one for the low MWSS value data, • one for higher values, • classifier to choose the appropriate model • Models based on Linear Regression and SVM
Results • Poor results for higher values of MWSS – insufficient values to train a model
MWSS position • A few outliers and “strange” values in the data set • After elimination: • Further investigation needed into the data and the “outlier” values, although it is only a small number of them
Genetic data • Single coronary angiography • Blood chemistry • Medications • Single Nucleotide Polymorphism (SNP) data on selected DNA sequences
Datamining @ ARTreat Project Veljko Milutinović vm@etf.rsZoran Babović zbabovic@gmail.comNenad Korolija nenadko@gmail.comGoran Rakočević g.rakocevic@gmail.comMarko Novaković atisha34@yahoo.com