Analyzing Brain Signals by Combinatorial Optimization

Analyzing Brain Signalsby Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN September 25, 2008

Topics • Mathematicalproblem • Similarity of Multiple Point Processes • Motivation/Application • Diagnosis of Alzheimer’sdiseasefrom EEG signals Collaborators François Vialatte*, Theophane Weber+, and AndrzejCichocki* (*RIKEN, +MIT) Financial Support

Alzheimer's disease One disease, many symptoms Evolution of the disease (stages) EEG data • 2 to 5 years before • mild cognitive impairment (MCI) • 6 to 25 % progress to Alzheimer‘s memory, language, executive functions, apraxia, apathy, agnosia, etc… • Mild (early stage) • becomes less energetic or spontaneous • noticeable cognitive deficits • still independent (able to compensate) Memory (forgetting relatives) • Moderate (middle stage) • Mental abilities decline • personality changes • become dependent on caregivers Apathy • Severe (late stage) • complete deterioration of the personality • loss of control over bodily functions • total dependence on caregivers Video sources: Alzheimer society • 2% to 5% of people over 65 yearsold • up to 20% of people over 80 • Jeong 2004 (Nature) GOAL: Diagnosis of MCI based on EEG • EEG isrelativelysimple and inexpensivetechnology • Earlydiagnosis: medication more effective, more time to prepare future care of patient, etc.

Overview • Alzheimer’s Disease (AD) decrease in EEG synchrony • Similarity of Point Processes • Two 1-D point processes • Two multi-D point processes • Multiple multi-D point processes • Numerical Results • Conclusion

Alzheimer's disease Inside glimpse: abnormal EEG EEG system: inexpensive, mobile, useful for screening Decrease of synchrony • AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) • AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) • MCI vs. mildAD(Babiloni et al., 2006). Images: www.cerebromente.org.br

Spontaneous (scalp) EEG Time-frequency |X(t,f)|2 (wavelet transform) f (Hz) Time-frequency patterns (“bumps”) Fourier |X(f)|2 Fourier power t (sec) amplitude EEG x(t)

Sparse representation: bump model f(Hz) f(Hz) Bumps Sparse representation t (sec) f(Hz) t (sec) 104- 105 coefficients t (sec) • Assumptions: • time-frequency map is suitable representation • oscillatory bursts (“bumps”) convey key information about 102 parameters F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).

Similarity of bump models How “similar”are n ≥ 2 bump models? Similarity of multiplemulti-dimensional point processes with and “point” / ”event”

Two one-dimensional point processes x 0 t x’ 0 t

Generative model non-coincident x x T0 0 0 T0 v 0 T0 -δt /2 T0 x δt /2 0 x’ non-coincident 0 T0 Stochastic event synchrony (SES): delayδt, jitterst, non-coincidenceρ

Generative model non-coincident x x T0 i.i.d. deletions with probpd 0 Gaussian offsets with mean -δt /2 and variance st/2 0 T0 v geometric prior for lenght 0 T0 -δt /2 events i.u.d. in [0,T0] Gaussian offsets with mean δt /2 and variance st/2 T0 x δt /2 0 x’ i.i.d. deletions with probpd non-coincident 0 T0 Marginalizing over v:

Generative model (2) Model Cost function unit cost of non-coincident event unit cost of coincident pair

Probabilistic inference PROBLEM: Given 2point processes x and x’, compute ρandθ = δt ,st APPROACH: (j*, j’*,θ*) = argmaxj,j’,θ log p(x, x’, j, j’,θ) SOLUTION: Coordinate descent (j(i+1), j’(i+1)) = argmaxj,j’ log p(x, x’, j , j’ , θ(i)) θ(i+1) = argmaxx log p(x, x’, j(i+1), j’(i+1) , θ) DYNAMIC PROGRAMMING PARAMETER ESTIMATION x’6 x’5 x’4 x’3 x’2 x’1 x’k’non-coincident xknon-coincident (xkx’k’ ) coincident pair 0 0 x1 x2 x3 x4 x5 x6

Application: spike trains Low reliability Small timing dispersion High reliability Large timing dispersion jitterst= (3ms)2, non-coincidenceρ = 27% jitterst= (15ms)2, non-coincidenceρ = 3%

Similarity of two bump models...

... by matching bumps • Bumps in one model, but NOT in other • → fraction of “non-coincident” bumps ρ • Bumps in both models, but with offset • → Average time offset δt(delay) • → Timing jitter with variance st • → Average frequency offset δf • → Frequencyjitter with variance sf Stochastic Event Synchrony (SES) =(ρ,δt,st, δf, sf) PROBLEM: Given two bump models, compute (ρ,δt,st, δf, sf)

Generative model yhidden • Generate bump model (hidden) • geometric prior for number of bumps • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. • Generate two “noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( ±δt /2, ±δf /2) • covariance diag(st/2, sf/2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd y y’ ( -δt /2, -δf/2) ( δt /2, δf/2) • Binary variables ckk’:ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 • Constraints: sums Σk’ckk’ and Σkckk’ are binary (“matching constraints”)

Probabilistic inference PROBLEM: Given two bump models, compute (ρ,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING POINT ESTIMATION

Probabilistic inference (2) MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) EQUIVALENT to (imperfect) bipartite max-weight matching problem c(i+1) = argmaxclog p(y, y’, c, θ(i)) = argmaxcΣkk’wkk’(i)ckk’ s.t. Σk’ckk’ ≤ 1and Σkckk’ ≤ 1 and ckk’ 2{0,1} find heaviest set of disjoint edges not necessarily perfect • ALGORITHMS • Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm) • Linear programming relaxation: gives optimal solution if unique [Sanghavi (2007)] • Max-product algorithmgives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]

Max-product algorithm MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) μ↓ μ↓ μ↑ μ↑ • At convergence, compute marginals p(ckk’) = μ↓(ckk’)μ↓(ckk’)μ↑(ckk’) • Decisions: c*kk’ = argmaxckk’ p(ckk’) (optimal if solution unique)

Summary PROBLEM: Given two bump models, compute (ρ,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING → max-product ESTIMATION → closed-form

Average synchrony 3. SES for each pair of models 4. Average the SES parameters • Group electrodes in regions • Bump model for each region

Beyond pairwise interactions... Multi-variate similarity Pairwise similarity

Similarity of multiple bump models y2 y1 y3 y4 y5 • Models similar if • few deletions/large clusters • little jitter y2 y1 y3 y4 y5 Constraint: in each cluster at most one bump from each signal pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m, δf,m , st,m, sf,m,pc

Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. y2 y1 y3 y4 y5 • Generate M“noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( δt,m /2, δf,m /2) • covariance diag(st,m/2, sf,m /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m, δf,m , st,m, sf,m,pc

Exemplar-based formulation yhidden y2 y1 y3 y4 y5 • Exemplars= identical copies of hidden bumps = cluster “center” • Other bumps in cluster = non-identical copies of exemplars • Is event an exemplar? • If not, which exemplar is it associated with? • Several constraints Integer program

Exemplar-based formulation: IP Binary Variables Integer Program: LINEAR objective function/constraints Equivalent to k-dim matching: for k = 2: in P but for k > 2: NP-hard!

Probabilistic inference PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m,pc APPROACH: (b*,θ*) = argmaxb,θ log p(y, y’, b, θ) SOLUTION: Coordinate descent b(i+1)= argmaxc log p(y, y’, b, θ(i)) θ(i+1) = argmaxx log p(y, y’, b(i+1),θ) CLUSTERING (Integer Program) ESTIMATION OF PARAMETERS • Integer programming methods (e.g., LP relaxation) • IP with 10.000 variables solved in about 1s • CPLEX: commercial toolbox for solving IPs (combines several algorithms) • NOTE:Max-product algorithm SUBOPTIMAL • sometimes converged to “bad” solutions (how to fix??)

Summary Similarity of multiplemulti-dimensional point processes Step 1: TWO ONE-dimensional point processes Dynamic programming Step 2: TWO MULTI-dimensional point processes Max-product/LP relaxation/Edmund-Karp Step 3: MULTIPLE MULTI-dimensional point processes Integer Programming

EEG Data • EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched • control subjects (CTR) recorded while in rest with closed eyes • →spontaneous EEG • All 22 MCI patients suffered from Alzheimer’s disease (AD) later on • Electrodes located on 21 sites according to 10-20 international system • Electrodes grouped into 5 zones (reduces number of pairs) • 1 bump model per zone • Band pass filtered between 4 and 30 Hz EEG data provided by Prof. T. Musha

Sensitivity (average synchrony) Corr/Coh Granger Info. Theor. State Space Phase SES Significant differences for ffDTF and SES (more unmatched bumps, but same amount of jitter) Mann-Whitney test: small p value suggests large difference in statistics of both groups

Classification (bi-SES) ± 85% correctly classified ffDTF • Clearseparation, but not yet useful as diagnostic tool • Additionalindicators needed (fMRI, MEG, DTI, ...) • Can be used for screening population (inexpensive, simple, fast)

Conclusions • Measure for similarity of point processes • Key idea: matching of events • Application: EEG synchrony of MCI patients • About 85% correctly classified perhaps useful for screening a large population • Future work: • Combination with other modalities (MEG, fMRI,...) • Alternative inference techniques (variations on max-product, Monte-Carlo) • More sophisticated models (e.g., interaction between events)

Analyzing Brain Signalsby Combinatorial Optimization Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN September 25, 2008

References + software Software MATLAB implementation of the synchrony measures References Quantifying Statistical Interdependence by Message Passing on Graphs: Algorithms and Application to Neural Signals, Neural Computation (under revision) A Comparative Study of Synchrony Measures for the Early Diagnosis of Alzheimer's Disease Based on EEG, NeuroImage (under revision) Measuring Neural Synchrony by Message Passing, NIPS 2007 Quantifying the Similarity of Multiple Multi-Dimensional Point Processes by Integer Programming with Application to Early Diagnosis of Alzheimer's Disease from EEG, EMBC 2008 (submitted)

Estimation Simple closed form expressions Deltas: average offset Sigmas: var of offset artificial observations (conjugate prior) ...where

Large-scale synchrony Apparently, all brain regions affected...

Alzheimer's disease Outside glimpse: the future (prevalence) USA (Hebert et al. 2003) • 2% to 5% of people over 65 years old • Up to 20% of people over 80 • Jeong 2004 (Nature) Million of sufferers World (Wimo et al. 2003) Million of sufferers

Ongoing and future work Applications • Fluctuations of EEG synchrony • Caused by auditory stimuli and music (T. Rutkowski) • Caused by visual stimuli (F. Vialatte) • Yoga professionals (F. Vialatte) • Professional shogi players (RIKEN & Fujitsu) • Brain-Computer Interfaces (T. Rutkowski) • Spike data from interacting monkeys (N. Fujii) • Calcium propagation in gliacells (N. Nakata) • Neural growth (Y. Tsukada & Y. Sakumura) • ... Algorithms • alternative inference techniques (e.g., MCMC, linear programming) • time dependent (Gaussian processes) • multivariate (T.Weber)

Adaptation After adaptation Initialisation Bump Fitting bump models  Signal gradient method F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).

Boxplots • SURPRISE! • No increase in jitter, but significantly lessmatched activity! • Physiological interpretation • neural assemblies more localized? • harder to establish large-scale synchrony?

Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • p(n) = (1- λ S) (λ S)-n • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. • Generate two “noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( ±δt /2, ±δf /2) • covariance diag(st/2, sf /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd y y’ ( -δt /2, -δf/2) ( δt /2, δf/2) Easily extendable to more than 2 observations…

Probabilistic inference PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING POINT ESTIMATION

Alzheimer's disease Inside glimpse: abnormal EEG EEG system: inexpensive, mobile, useful for screening Brain “slow-down” slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz) (Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993). focus of this project Decrease of synchrony • AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) • AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) • MCI vs. mildAD(Babiloni et al., 2006). Images: www.cerebromente.org.br

Comparing EEG signal rhythms ? 2 signals PROBLEM I: Signals of 3 seconds sampled at 100 Hz ( 300 samples) Time-frequency representation of one signal = about 25 000 coefficients

Analyzing Brain Signals by Combinatorial Optimization

Analyzing Brain Signals by Combinatorial Optimization

Presentation Transcript

Combinatorial Optimization in Computational Biology

Combinatorial Optimization

Analyzing Multivariable Change: Optimization

Combinatorial Optimization and Computer Vision

Combinatorial Optimization in Container Scheduling

Combinatorial Optimization and Combinatorial Structure in Computational Biology

Approximations for Combinatorial Optimization

Distributed Combinatorial Optimization

Combinatorial Optimization for Graphical Models

Guiding Combinatorial Optimization with UCT

Probabilistic Modeling for Combinatorial Optimization

EAs for Combinatorial Optimization Problems

Discovering Combinatorial Optimization with the ILOG Optimization Suite

Combinatorial Optimization for Text Layout

Combinatorial Optimization

IE 635 Combinatorial Optimization

IE 635 Combinatorial Optimization

Analyzing Brain Signals by Combinatorial Optimization

Combinatorial Optimization and Combinatorial Structure in Computational Biology

Introduction to Nonlinear Combinatorial Optimization

Problems in Combinatorial Optimization