Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

Machine learning techniques for quantifying neural synchrony:application to the diagnosis of Alzheimer's diseasefrom EEG Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

Acknowledgments Collaborators François Vialatte*, Theo Weber+, Shun-ichi Amari*, Andrzej Cichocki* (*RIKEN, +MIT) Financial Support • RIKEN Wako Campus (near Tokyo) • about 400 researchers and staff (20% foreign) • 300 research fellows and visiting scientists • about 60 laboratories • research covers most aspects of brain science

Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

Alzheimer's disease Outside glimpse: clinical perspective Evolution of the disease (stages) One disease, many symptoms • 2 to 5 years before • mild cognitive impairment (often unnoticed) • 6 to 25 % progress to Alzheimer's per year memory, language, executive functions, apraxia, apathy, agnosia, etc… • Mild (early stage) • becomes less energetic or spontaneous • noticeable cognitive deficits • still independent (able to compensate) Memory (forgetting relatives) • Moderate (middle stage) • Mental abilities decline • personality changes • become dependent on caregivers Apathy • Severe (late stage) • complete deterioration of the personality • loss of control over bodily functions • total dependence on caregivers Loss of Self-control • 2% to 5% of people over 65 years old • up to 20% of people over 80 • Jeong 2004 (Nature) Video sources: Alzheimer society

Alzheimer's disease Inside glimpse: brain atrophy amyloid plaques and neurofibrillary tangles Video source: Alzheimer society Images: Jannis Productions. (R. Fredenburg; S. Jannis) Video source: P. Thompson, J.Neuroscience, 2003

Alzheimer's disease Inside glimpse: abnormal EEG EEG system: inexpensive, mobile, useful for screening Brain “slow-down” slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz) (Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993). focus of this project Decrease of synchrony • AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) • AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) • MCI vs. mildAD (Babiloni et al., 2006). Images: www.cerebromente.org.br

Spontaneous EEG Time-frequency |X(t,f)|2 (wavelet transform) f (Hz) Time-frequency patterns (“bumps”) Fourier |X(f)|2 Fourier power t (sec) amplitude EEG x(t)

Comparing EEG signals 2 signals PROBLEM I: Signals of 3 seconds sampled at 100 Hz ( 300 samples) Time-frequency representation of one signal = about 25 000 coefficients

One pixel Numerous neighboring pixels Comparing EEG signals (2) PROBLEM II: Shifts in time-frequency!

Sparse representation: bump model f(Hz) f(Hz) Bumps Sparse representation t (sec) f(Hz) t (sec) 104- 105 coefficients t (sec) about 102 parameters • Assumptions: • time-frequency map is suitable representation • oscillatory bursts (“bumps”) convey key information F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).

Similarity of bump models... How “similar”or “synchronous”are two bump models?

... by matching bumps y2 y1 Some bumps match Offset between matched bumps SIMILAR bump models if: Many matches Strongly overlapping matches

... by matching bumps (2) • Bumps in one model, but NOT in other • → fraction of “spurious” bumps ρspur • Bumps in both models, but with offset • → Average time offset δt(delay) • → Timing jitter with variance st • → Average frequency offset δf • → Frequencyjitter with variance sf • Synchrony: only st and ρspur relevant Stochastic Event Synchrony (SES) =(ρspur,δt,st, δf, sf) PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf)

Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • p(n) = (1- λ S) (λ S)-n • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. • Generate two “noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( ±δt /2, ±δf /2) • covariance diag(st/2, sf /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd y y’ ( -δt /2, -δf/2) ( δt /2, δf/2) Easily extendable to more than 2 observations…

Generative model (2) y y’ i ( -δt /2, -δf/2) j’ i’ ( δt /2, δf/2) • Binary variables ckk’ • ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 (e.g., cii’ = 1 cij’ = 0) • Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’ are binary (“matching constraints”) • Generative Model p(y, y’, yhidden ,c, δt, δf , st,sf ) (symmetric in y and y’) • Eliminate yhidden→ offset is Gaussian RV with mean = ( δt , δf ) and covariance diag (st , sf) • Probabilistic Inference: θ p(y, y’, c, θ) = ∫p(y, y’, yhidden ,c, θ) dyhidden (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)

Summary • Bumps in one model, but NOT in other • → fraction of “spurious” bumps ρspur • Bumps in both models, but with offset • → Average time offset δt(delay) • → Timing jitter with variance st • → Average frequency offset δf • → Frequencyjitter with variance sf PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)

Objective function y y’ i ( -δt /2, -δf/2) j’ i’ ( δt /2, δf/2) • Logarithm of model: log p(y, y’, c, θ) = Σkk’wkk’ckk’ + log I(c) + log pθ(θ) + γ wkk’ = -(1/2st (t k’ – tk –δt)2 + 1/2sf (f k’ – fk–δf)2) - 2log β- 1/2 log st - 1/2 log sf β = pd (λ/V)1/2 Euclidean distance between bump centers • Largewkk’ if : a) bumps are close b) small pd c) few bumps per volume element • No need to specify pd , λ, and V, they only appear through β= knob to control # matches

Generative model p(y, y’, c, θ) /I(c) pθ(θ) Πkk’(N(t k’ – tk ;δt ,st,kk’)N(f k’ – fk ;δf ,sf, kk’)β-2)ckk’

Prior for parameters • Expect bumps to appear at about same frequency, but delayed Frequency shift requires non-linear transformation, less likely than delay • Conjugate priors for st and sf(scaled inverse chi-squared): • Improper prior for δtandδt : p(δt) = 1 = p(δf)

Probabilistic inference PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING POINT ESTIMATION Minx2X,y2Yd(x,y) X Y

Probabilistic inference POINT ESTIMATION: θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) Uniform prior p(θ): δt, δf= average offset, st, sf= variance of offset Conjugate prior p(θ): still closed-form expression Other kind of prior p(θ): numerical optimization (gradient method)

Probabilistic inference MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) EQUIVALENT to (imperfect) bipartite max-weight matching problem (belongs to P) c(i+1) = argmaxc log p(y, y’, c, θ(i)) = argmaxc Σkk’wkk’(i)ckk’ s.t. Σk’ ckk’ ≤ 1and Σk ckk’ ≤ 1 with ckk’ 2{0,1} find heaviest set of disjoint edges not necessarily perfect • ALGORITHMS • Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm) • Linear programming relaxation gives optimal solution if unique [Sanghavi (2007)] • Max-product algorithmgives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]

Max-product algorithm MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) Generative model p(y, y’, c, θ) /I(c) pθ(θ) Πkk’(N(t k’ – tk ;δt ,st,kk’)N(f k’ – fk ;δf ,sf, kk’)β-2)ckk’

Max-product algorithm MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) Conditioning on θ μ↓ μ↓ μ↑ μ↑

Max-product algorithm (2) • Iteratively compute messages • At convergence, compute marginals p(ckk’) = μ↓(ckk’)μ↓(ckk’)μ↑(ckk’) • Decisions: c*kk’ = argmaxckk’ p(ckk’)

Summary PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING → max-product ESTIMATION → closed-form

Average synchrony 3. SES for each pair of models 4. Average the SES parameters • Group electrodes in regions • Bump model for each region

Beyond pairwise interactions... Multi-variate similarity Pairwise similarity

...by clustering y2 y1 y3 y4 y5 • Models similar if • few deletions/large clusters • little jitter y2 y1 y3 y4 y5 Constraint: in each cluster at most one bump from each signal

Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • p(n) = (1- λ S) (λ S)-n • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. y2 y1 y3 y4 y5 • Generate M“noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( δt,m /2, δf,m /2) • covariance diag(st,m/2, sf,m /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m , δf,m , st,m , sf,m,pc

Inference • SOLUTION 1 • ITERATE: • Infer hidden bump model (= cluster centers) • Compute parameters δt,m , δf,m , st,m , sf,m,pc • = adaption of K-means clustering • = ESTIMATION problem • PROBLEM: LOCAL extrema! • SOLUTION 2 • Assumption: one bump in each cluster is hidden bump (“exemplar”) • ITERATE: • Find exemplars (= cluster centers) and non-exemplars • Compute parameters δt,m , δf,m , st,m , sf,m,pc • = DETECTION problem (combinatorial optimization/integer program) • = adaption of “affinity propagation” [Frey et al., Science, 2007] • or convex clustering algorithm [Lashkari et al., NIPS 2007] • ADVANTAGE: we can find GLOBAL OPTIMUM!

Exemplar-based formulation yhidden y2 y1 y3 y4 y5 pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m , δf,m , st,m , sf,m,pc Set of EXEMPLARS = “average” point process → compression

Exemplar-based formulation: IP Binary Variables Integer Program: LINEAR objective function/constraints Equivalent to k-dim matching: for k = 2: in P but for k > 2: NP-hard!

Probabilistic inference PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m,pc APPROACH: (b*,θ*) = argmaxc,θ log p(y, y’, b, θ) SOLUTION: Coordinate descent b(i+1)= argmaxc log p(y, y’, b, θ(i)) θ(i+1) = argmaxx log p(y, y’, b(i+1),θ) CLUSTERING (IP or MP) POINT ESTIMATION • Integer program • Max-product algorithm (MP) on sparse graph: FAILED! • Integer programming methods (e.g., LP relaxation): GREAT! • IP with 10.000 variables solved in about 1s, total run time about 5s • CPLEX: commercial toolbox for solving IPs (combines several algorithms)

EEG Data • EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched • control subjects (CTR) recorded while in rest with closed eyes • →spontaneous EEG • All 22 MCI patients suffered from Alzheimer’s disease (AD) later on • Electrodes located on 21 sites according to 10-20 international system • Electrodes grouped into 5 zones (reduces number of pairs) • 1 bump model per zone • Used continuous “artifact-free” intervals of 20s • Band pass filtered between 4 and 30 Hz EEG data provided by Prof. T. Musha

Similarity measures • Correlation and coherence • Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ... • Phase Synchrony: compareinstantaneous phases (wavelet/Hilbert transform) • State space based measures • sync likelihood, S-estimator, S-H-N-indices, ... • Information-theoretic measures • KL divergence, Jensen-Shannon divergence, ... FREQUENCY TIME No Phase Locking Phase Locking

Sensitivity (average synchrony) Corr/Coh Granger Info. Theor. State Space Phase SES Mann-Whitney test: small p value suggests large difference in statistics of both groups Significant differences for ffDTF and ρ!

Classification ffDTF • Clearseparation, but not yet useful as diagnostic tool • Additionalindicators needed (fMRI, MEG, DTI, ...) • Can be used for screening population (inexpensive, simple, fast)

Correlations Strong (anti-) correlations „families“ of sync measures

Recent results for multivariate SES Cluster size = 1 Cluster size = 2 Cluster size = 3 Cluster size = 4 SMALLER clusters in MCI patients! Cluster size = 5

Recent results for multivariate SES (2) ± 85% correctly classified Average cluster size ± 90% correctly classified Average cluster size

Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Ongoing Work and Outlook

Ongoing work no stimulus no stimulus stimulus high st low st high st high st low st high st Time-varying similarity parameters st

Probabilistic inference PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m,pc APPROACH: (b*,θ*) = argmaxc,θ log p(y, y’, b, θ) SOLUTION: Coordinate descent b(i+1)= argmaxc log p(y, y’, b, θ(i)) θ(i+1) = argmaxx log p(y, y’, b(i+1),θ) CLUSTERING (IP or MP) CUBIC SPLINE SMOOTHING PRIOR: POINT ESTIMATION:

Future work MODEL yhidden • Less trivial prior (not i.i.d.) • Multiple hidden processes • independent • dependent y2 y1 y3 y4 y5 More complex transformations = “observation model” INFERENCE: infer distributions instead of point estimates

Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

Presentation Transcript

SEC Hot Topics Institute June 2008

Responsive Homes + Energy Behaviour Change MIT House_n Research Consortium Review MIT, Cambridge, MA June 3, 2008

Reno, Nevada - June 11, 2008

IDEM UPDATE June 11, 2008

ECSU-NAM 2008 Summer Research Institute in Computational Science-Science Visualization

Justin Dauwels LIDS, MIT LMI, Harvard Medical School

ECSU-NAM 2008 Summer Research Institute in Computational Science-Science Visualization

June 4-11, 2008

HR Symposium June 11, 2008

A RIKEN BNL Research Center Workshop A RIKEN BNL Research Center Workshop

RIKEN Genomic Science Center Fumikazu KONISHI

i-LIDS

11 June 2008 Cape Town

Southern Company Visit June 11, 2008

CFFO MEETING JUNE 11, 12, 2008

Research Committee – June 2008

June 9-11, 2008

Cambridge-MIT Institute

June 11, 2008

Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

IDEM UPDATE June 11, 2008

Summer Science Research Institute