Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University

Cognitive processes: Read sentence How can we track hidden cognitive processes? View picture Decide whether consistent ? Observed fMRI: cortical region 1: cortical region 2: Observed button press:

Typical BOLD response • At left is a typical averaged BOLD response • Here, subject reads a word, decides whether it is a noun or verb, and pushes a button in less than 1 second. Signal Amplitude Time (seconds)

Related Work • General linear model (GLM) applied to fMRI • E.g., [Dale 1999]; SPM; • Accommodates multiple, overlapping processes, • But not unknown process timing • Dynamic Bayesian Networks • Family of probabilistic models for time series • E.g., Factorial HMMs [Ghahramani & Jordan 1998] • Accommodate hidden timings/states • But do not capture convolution of overlapping states • Require learning detailed next-state function

Approach: Hidden Process Models • Probabilistic model • Can evaluate P(model | data), P(data | model) • Describe hidden processes by their • Type, duration, start time, fMRI signature • Algorithms for learning model, interpreting data • Learn maximum likelihood models and data interpretations

¢ 2 ¢ 1 ¢ 1 Time landmarks: ¢ 3 Hidden Process Models ID: 1 Timing: P(start=+O) Response: ID: 2 Timing: P(start=+O) Response: ID: 3 Timing: P(start=+O) Response: Processes: Process ID = 1 Process ID = 1 Process Instances: Process ID = 2 View picture Process ID = 3 Decide whether consistent Observed fMRI: 

sentence sentence Process: ViewPicture Duration d: 11 sec. P(Offset times): ,  Response signature W: Input Stimulus : picture Hidden Process Models Timing landmarks : ¢2 ¢ 1 ¢ 3 Process instance:2 Process h: ViewPicture Timing landmark : 2 Offset time O: 1 sec Start time ´ + O Configuration C of Process Instances h1, 2, … i 1 4 2 3  Observed data Y:

Process h = h d, , , W i Process Instance = hh, , O i Configuration C = set of Process Instances Hidden Process Model HPM = hH, , C, i H:set of processes : prior probs over H C: set of candidate configurations  : h1 … vi voxel noise model HPMs More Formally…

HPM Generative Model Probabilistically generate data using a configuration of N process instances with known landmarks: • Generate a configuration C of process instances: For i=1 to N, generate process instance i • Choose a process hi according to P(h| i ,) • Choose an offset Oi according to P(O|(h) ) • Generate all observed fMRI data ytv given C:

HPM Inference • Given: • An HPM, • including a set of candidate configurations • we typically assume processes known, but not timing • Observed data Y • Determine: • The most probable process instance configuration c • P(C=c|Y, HPM) a P(Y|C=c, HPM) P(C=c | HPM)

Inference: Example ProcessID=1, S=1 Configuration 1: ProcessID=2, S=17 ProcessID=3, S=21 ProcessID=2, S=1 Configuration 2: ProcessID=1, S=17 ProcessID=3, S=23 Observed data Prediction 1 Prediction 2

Learning HPMs with unknown timing O(), known processes h() EM (Expectation-Maximization) algorithm • E-step • Estimate the conditional distribution over start times of the process instances given observed data, P(O(1)…O(N) | Y, h(1)… h(N), HPM). • M-step • Use the distribution from the E step to get maximum-likelihood estimates of the HPM parameters. * In real problems, some timings are often known

HPMs are learnable from realistic amounts of data

true signal Observed noisy signal true response W learned W Process 1 Process 2 Process 3 Figure 1. The learner was given 80 training examples with known start times for only the first two processes. It chooses the correct start time (26) for the third process, in addition to learning the HDRs for all three processes.

fMRI Study: Pictures and Sentences Press Button View Picture Read Sentence • Each trial: determine whether sentence correctly describes picture • 40 trials per subject. • Picture first in 20 trials, Sentence first in other 20 • Images acquired every 0.5 seconds. Read Sentence Fixation View Picture Rest t=0 4 sec. 8 sec.

Cognitive processes: Read sentence HPM model for Picture-Sentence Comparison View picture Decide whether consistent ? Observed fMRI: cortical region 1: cortical region 2: Observed button press:

Learned models: S S P P D D S P D reconstructed D start time chosen by program as t+18 Learned HPM with 3 processes (S,P,D), and R=13sec (TR=500msec). S S P P D? D? observed

HPMs provide more accurate classification of unknown processes than earlier methods (e.g., Gaussian Naïve Bayes (GNB) classifier)

Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. 16 sec. picture or sentence? GNB: picture or sentence? Standard classifier formulation Standard formulation of classification problem (e.g., Gaussian Naïve Bayes (GNB)): Train on labeled data: known Processes, known StartTimes Test on unlabeled data: unknown Processes, known StartTimes

HPM: picture or sentence? picture or sentence? Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. 16 sec. picture or sentence? GNB: picture or sentence? HPM classifier accounts for overlap

HPM: picture or sentence? picture or sentence? Results Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. 16 sec. GNB: picture or sentence? picture or sentence? HPM with overlapping processes improves accuracy by 15% on average.

HPMs allow detecting and examining hidden processes with unknown timing

Cognitive processes: Read sentence Twocognitive processes, or three? View picture Decide whether consistent ? Observed fMRI: cortical region 1: cortical region 2: Observed button press:

Choosing Between Alternative HPM Models • Train 2-process HPM2 on training data • Train 3-process HPM3 on training data • Test HPM2 and HPM3 on separate test data • Which predicts process identities better? • Which has higher probability given the test data? • (use n-fold cross-validation for test)

2-process HPM, 3-process HPM, GNB

Summary • Hidden Process Model formalism • Superiority over earlier classification methods • Basis for studying hidden cognitive processes

Future Directions • Add temporal and/or spatial smoothness constraints to process fMRI signatures • Allow variable duration processes • Give processes input arguments, output results • Feature selection for HPMs • Process libraries, hierarchies

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data