Datamining @ ARTreat

Datamining @ ARTreat Veljko Milutinović vm@etf.rsZoran Babović zbabovic@gmail.comNenad Korolija nenadko@gmail.comGoran Rakočević g.rakocevic@gmail.comMarko Novaković atisha34@yahoo.com

Agenda • ARTReat – the project • Arteriosclerosis – the basics • Plaque classification • Hemodynamic analysis • Data mining for the hemodynamic problem • Data mining from patent records

ARTreat – the project • ARTreat targets at providing a patient-specific computational modelof the cardiovascular system, used to improve the quality of predictionfor the atherosclerosis progression and propagation into life-threatening events. • FP7 Large-scale Integrating Project (IP)‏ • 16 partners • Funding: 10,000,000 €

Atherosclerosis • Atherosclerosis is the condition in which anartery wall thickens as the result of a build-up of fatty materials such as cholesterol

Artheriosclerotic plaque • Begins as a fatty streak, an ill-defined yellow lesion–fatty plaque, develops edges that evolve to fibrous plaques, whitish lesions with a grumous lipid-rich core

Plaque components • Fibrous, Lipid, Calcified, Intra-plaque Hemorrhage

Plaque classification • Different types of plaque pose different risks • Manual plaque classification (done by doctors)is a difficult task, and is error prone • Idea: develop an AI algorithmto distinguish between different types of plaque • Visual data mining

Plaque classification (2)‏ • Developed by Foundation for Research and Technology • Based on Support Vector Machines • Looks at images produced by IVUS and MRIand are hand labeled by physicians • Up to 90% accurate

Data mining task in Belgrade • Two separate paths: • Data mining from the results of hemodynamic simulations • Data mining form medical patient records • Goal: to provide input regarding the progression of the diseaseto be used for medical decision support

Hemodynamics – the basics • Study of the flow of blood through the blood vessels • Maximum Wall Shear Stress – an important parameterfor plaque development prognoses

Hemodynamics - CFD • Classical methods for hemodynamic calculations employ Computer Fluid Dynamics (CFD) methods • Involves solving the Navier-Stokes equation: • …but involves solving it millions of times! • One simulation can take weeks

Data mining form hemodynamic simulations (first path)‏ • Idea: use results of previously done simulations • Train a data mining AI system capable of regression analysis • Use the system to estimate the desired valuesin a much shorter time

Neural Networks - background • Systems that are inspired by the principle of operationof biological neural systems (brain)

Neural Networks – the basics • A parallel, distributed information processing structure • Each processing element has a single output which branches (“fans out”) into as many collateral connections as desired • One input, one output and one or more hidden layers

Artificial neurons • Each node (neuron) consists of two segments: • Integration function • Activation function • Common activation function • Sigmoid

Neural Networks - backpropagation • A training method for neural networks • Try to minimize the error function:by adjusting the weights • Gradient descent: • Calculate the “blame” of each input for the output error • Adjust the weights by:(γ- the learning rate)

Input data set • Carotid artery • 11 geometric parameters and the MWSS value

The model • One hidden layer • Input layer: linear • Hidden and output: sigmoid • Learning rate 0.6 • 500K training cycles • Decay and momentum

Current results • Average error: 8.6% • Maximum error 16,9%

The “dreaded” line 4 • Line 4 of the original test set proved difficult to predict • Error was over 30% • Turned out to be an outlier • Combination of parameters was such that it couldn’t • But the CFD worked, NN worked • Visually the geometry looked fine • Goes to show how challenging the data preprocessing can be

Dataset analysis • Two distinct areas of MWSS values: • the subset with lower values of MWSS, where a similar clear pattern can be seen against all of the input variables, • scattered cloud of values in the subset with higher MWSS values. • Histogram shows the majority of values grouped in the lower half of the values in the set, with only a small number of points in the higher half.

MWSS value prediction • Two approaches: • Single model • Two models: • one for the low MWSS value data, • one for higher values, • classifier to choose the appropriate model • Models based on Linear Regression and SVM

Results • Poor results for higher values of MWSS – insufficient values to train a model

MWSS position • A few outliers and “strange” values in the data set • After elimination: • Further investigation needed into the data and the “outlier” values, although it is only a small number of them

Genetic data • Single coronary angiography • Blood chemistry • Medications • Single Nucleotide Polymorphism (SNP) data on selected DNA sequences

…and now for something completely different

Questions

Datamining @ ARTreat Project Veljko Milutinović vm@etf.rsZoran Babović zbabovic@gmail.comNenad Korolija nenadko@gmail.comGoran Rakočević g.rakocevic@gmail.comMarko Novaković atisha34@yahoo.com

Datamining @ ARTreat