Optimization and Data Mining in Epilepsy Research

Optimization and Data Mining in Epilepsy Research W. Art Chaovalitwongse Assistant Professor Industrial and Systems Engineering Rutgers University

Acknowledgements • Comprehensive Epilepsy Center, St. Peter’s University Hospital • Rajesh C. Sachdeo, MD • Deepak Tikku, MD • Brain Institute, University of Florida • Panos M. Pardalos, PhD • J. Chris Sackellares, MD • Paul R. Carney, MD • Bioengineering, Arizona State University • Leonidas D. Iasemidis, PhD

Agenda • Background: Epilepsy • Electroencephalogram (EEG) Time Series • Chaos Theory: Dimensionality Reduction • Seizure Prediction • Feature Selection • Process Monitoring • Concluding Remarks

Facts About Epilepsy • At least 2 million Americans and other 40-50 million people worldwide (about 1% of population) suffer from Epilepsy. • Epilepsy is the second most common brain disorder (after stroke) • The hallmark of epilepsy is recurrent seizures. • Epileptic seizures occur when a massive group of neurons in the cerebral cortex suddenly begin to discharge in a highly organized rhythmic pattern.

Epileptic Seizures • Seizures usually occur spontaneously, in the absence of external triggers. • Seizures cause temporary disturbances of brain functions such as motor control, responsiveness and recall which typically last from seconds to a few minutes. • Seizures may be followed by a post-ictal period of confusion or impaired sensorial that can persist for several hours.

Rationale • Based on 1995 estimates, epilepsy imposes an annual economic burden of $12.5 billion in the U.S. in associated health care costs and losses in employment, wages, and productivity. • Cost per patient ranged from $4,272 for persons with remission after initial diagnosis and treatment to $138,602 for persons with intractable and frequent seizures.

How To Fight Epilepsy • Anti-Epileptic Drugs (AEDs) • Mainstay of epilepsy treatment • Approximately 25 to 30% remain unresponsive • Epilepsy surgery • Require long-term invasive EEG monitoring • 50% of pre-surgical candidates do not undergo respective surgery • Multiple epileptogenic zones • Epileptogenic zone located in functional brain tissue • Only 60% of surgery cases result in seizure free • Electrical Stimulation (Vagus nerve stimulator) • Parameters (amplitude and duration of stimulation) arbitrarily adjusted • As effective as one additional AED dose • Side Effects • Seizure Prediction?

Vagus Nerve Stimulator

Open Problems • Is the seizure occurrence random? • If not, can seizures be predicted? • If yes, are there seizure pre-cursors preceding seizures? • If yes, what measurement can be used to indicate these pre-cursors? • Does normal brain activity during differ from abnormal brain activity?

Electroencephalogram (EEG) • …is a tool for evaluating the physiological state of the brain. • …offers excellent spatial and temporal resolution to characterize rapidly changing electrical activity of brain activation • …captures voltage potentials produced by brain cells while communicating. • In an EEG, electrodes are implanted in deep brain or placed on the scalp over multiple areas of the brain to detect and record patterns of electrical activity and check for abnormalities.

From Microscopic to Macroscopic Level (Electroencephalogram - EEG)

ROF LOF LTD RST LST LOF LST RTD LTD Depth and Subdural electrode placement for EEG recordings

Scalp EEG Data Acquisition

EEG Data Acquisition

Typical EEG Time Series Data

Goals of Research • Test the hypothesis that seizures are not a random process. • Employ data mining techniques to differentiate normal and abnormal EEGs • Employ quantitative analysis to identify seizure pre-cursors • Demonstrate that seizures could be predicted • Develop a closed-loop seizure control device (Brain Pacemaker)

10-second EEGs: Seizure Evolution Normal Pre-Seizure Post-Seizure Seizure

Dimensionality Reduction • The brain is a non-stationary system. • EEG time series is non-stationary. • With 200 Hz sampling, 1 hour of EEGs is comprised of 200*60*60*30 = 21,600,000 data points = 43.2MB (assume 16-bit ASCI format) • 1 day = 1 hour*24 • 1 week = 1 hour*168 • 20 patients = 1 hour*3360 → Terabytes → Gigabytes → Megabytes Kilobytes

Dimensionality Reduction Using Chaos Theory • Chaos in Brain? • Chaos in Stock Market? • Chaos in Foreign Exchanges (Swedish Currency)? • Measure the brain dynamics from EEG time series. • Apply dynamical measures (based on chaos theory) to non-overlapping EEG epochs of 10.24 seconds = 2048 points. • Maximum Short-Term Lyapunov Exponent • measures the average uncertainty along the local eigenvectors and phase differences of an attractor in the phase space • Measures the chaoticity of the brain waves

Embed the data set (EEG). Xi = (x(ti),x(ti+τ),…,x(ti+(p-1)τ))T whereτ is the selected time lag between the components of each vector in the phase space, p is the selected dimension of the embedding phase space, and ti  [1,T-(p-1) τ]. • Pick a point x(t0) somewhere in the middle of the trajectory. Find that point's nearest neighbor. Call that point z0 (t0). • Compute |z0 (t0) - x(t0)| = L0. • Follow the ``difference trajectory" -- the dashed line -- forwards in time, computing |z0 (ti) - x(ti)| = L0(i) and incrementing i, until L0(i) > ε. Call that value L0' and that time t1. • Find z1 (t1), the “nearest neighbor” of x(t1), and go to step 3. Repeat the procedure to the end of the fiduciary trajectory t = tn, keeping track of the Li and Li' . where M is the number of times we went through the loop above, and N is the number of time-steps in the fiduciary. NΔt = tn - t0

2-D Example: Circle of initial conditions evolves into an ellipse.

Pre-Ictal Ictal Post-Ictal STLmax Profiles

Hidden Synchronization Patterns

Then, we calculate the average value, ,and the sample standard deviation, , of . How similar are they?Statistics to quantify the convergence of STLmax By paired-T statistic: Per electrode, for EEG signal epochs i and j, suppose their STLmax values in the epochs (of length 60 points, 10 minutes) are The T-index between EEG signal epochs i and j is defined as

Statistically Quantifying the Convergence

IID (Independent and Identically Distributed) Test Assumption 1: Within a window of 30 STLmax points, the differences of STLmax values (Dij) between two electrode sites i and j are independent. To verify this assumption, Employ “portmanteau” test of white noise developed by Ljung and Box. Assumption 2: Within a wt window of 60 points, the differences of STLmax values between two electrode sites i and j are normally distributed. To verify this assumption, Employ To check this assumption, we employed the Shapiro-Wilk W test, which is is a well-established and powerful test of departure from normality.

Convergence of STLmax

Models Homoclinic Chaos (Silnikov’s Theorem): Rössler systems, Lorentz systems, population dynamical systems (1) w, a, b and g are intrinsic parameters. e and e’ are directional coupling strengths. N = number of oscillators (2) (3)

STLmax versus time and coupling

Not every electrode site shows the convergence. Feature Selection: Select the electrodes that are most likely to show the convergence preceding the next seizure. Why Feature Selection?

Optimization Problem • Optimization: • We apply optimization techniques to find a group of electrode sites such that … • They are the most converged (in STLmax) electrode sites during 10-min window before the seizure • They show the dynamical resetting (diverged in STLmax) during 10-min window after the seizure. • Such electrode sites are defined as “critical electrode sites”. • Hypothesis: • The critical electrode sites should be most likely to show the convergence in STLmax again before the next seizure.

Multi-Quadratic Integer Programming • To select critical electrode sites, we formulated this problem as a multi-quadratic integer (0-1) programming (MQIP) problem with … • objective function to minimize the average T-index among electrode sites • a linear constraint to identify the number of critical electrode sites • a quadratic constraint to ensure that the selected electrode sites show the dynamical resetting

Notation and Modeling • x is an n-dimensional column vector (decision variables), where each xi represents the electrode site i. • xi= 1 if electrode i is selected to be one of the critical electrode sites. • xi= 0 otherwise. • Qis an (nn) matrix, whose each element qijrepresents the T-index between electrode i andj during 10-minute window before a seizure. • bis an integer constant. (the number of critical electrode sites) • Dis an (nn) matrix, whose each element dijrepresents the T-index between electrode i andj during 10-minute window after a seizure. • α = 2.662*b*(b-1), an integer constant. 2.662 is the critical value of T-index, as previously defined, to reject H0: “`two brain sites acquire identical STLmax values within 10-minute window”

Conventional Linearization Approach for Multi-Quadratic 0-1 Problem

KKT Conditions Approach • Consider the quadratic 0-1 programming problem • eT = (1,1,…,1) • Relax x ≥ 0, we then have the following KKT conditions: Q is an (nn) matrix. b is an integer constant x is an n-dimensional column vector

KKT Conditions Approach • Add slack variables a and define s = u.e + a • Minimizing slack variables, we can formulate this problem as: • Note that this problem formulation is an efficient approach, as n increases, because it has the SAME number of 0-1 variables (n), and 2n additional continuous variables. Fix x{0,1}

Connections Between QIP problems and MILP problems • For any matrix Q where qij≥0 • We want to prove that P and P are equivalent: Equivalent

Theoretical Results:MILP formulation for MQIP problem • Consider the MQIP problem • We proved that the MQIP program is EQUIVALENT to a MILP problem with the SAME number of integer variables. Equivalent

Reference: • P.M. Pardalos, W. Chaovalitwongse, L.D. Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R. Carney, O.A. Prokopyev, and V.A. Yatsenko. Seizure Warning Algorithm Based on Spatiotemporal Dynamics of Intracranial EEG. Mathematical Programming, 101(2): 365-385, 2004.

Empirical Results:Performance on Larger Problems • Reference: • W. Chaovalitwongse, P.M. Pardalos, and O.A. Prokopyev. Reduction of Multi-Quadratic 0-1 Programming Problems to Linear Mixed 0-1 Programming Problems. Operations Research Letters, 32(6): 517-522, 2004.

Empirical Results:Performance on Larger Problems

Hypothesis Testing - Simulation • Hypothesis: • The critical electrode sites should be most likely to show the convergence in STLmax (drop in T-index below the critical value) again before the next seizure. • The critical electrode sites are electrode sites that • are the most converged (in STLmax ) electrode sites during 10-min window before the seizure • show the dynamical resetting (diverged in STLmax ) during 10-min window after the seizure • Simulation: • Based on 3 patients with 20 seizures, we compare the probability of showing the convergence in STLmax (drop in T-index below the critical value) before the next seizure between the electrode sites, which are • Critical electrode sites • Randomly selected (5,000 times)

Optimal VS Non-Optimal

Simulation - Results

How to automate the system

Automated Seizure Warning System ASWA Monitor the average T-index of the critical electrodes Continuously calculate STLmax from multi- channel EEG. Select critical electrode sites after every subsequent seizure EEG Signals Give a warning when: T-index value is greater than 5, then drops to a value of 2.662 or less

Data Characteristics

Performance Evaluation for ASWS • To test this algorithm, a warning was considered to be true if a seizure occurred within 3 hours after the warning. • Sensitivity = • False Prediction Rate = average number of false warnings per hour

Optimization and Data Mining in Epilepsy Research

Optimization and Data Mining in Epilepsy Research

Presentation Transcript

Data Mining Research and Applications

Optimization in Data Mining

Ant Colony Optimization and its Potential in Data Mining

Data Mining for Query Optimization

Data Mining for Query Optimization

Data Mining and Knowledge Discovery for Strategic Business Optimization

Query Optimization: Relational Queries to Data Mining

Research on Epilepsy

Query Optimization to Data Mining

Data Mining Concepts and Research Trends

Current Research in Data Mining Research Group

Spatial Data Mining: Accomplishments and Research Needs

Data Mining for Query Optimization

Current Research in Data Mining Research Group

What’s New in Epilepsy Research?

Research and Advances in Pediatric Epilepsy, 2016

Research Problems in Digital Libraries: Data Mining and Text Mining

Optimization-Based Data Mining Approaches in Neuroscience Research

Optimization Methods in Data Mining

Computer Vision and Data Mining Research Projects

Research Issues in Web Data Mining