Hidden Process Models

Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi

Talk Outline • fMRI (functional Magnetic Resonance Imaging) data • Prior work on analyzing fMRI data • HPMs (Hidden Process Models) • Preliminary results • HPMs and BodyMedia

functional MRI

fMRI Basics • Safe and non-invasive • Temporal resolution ~ 1 3D image every second • Spatial resolution ~ 1 mm • Voxels: 3mm x 3mm x 3-5mm • Measures the BOLD response: Blood Oxygen Level Dependent • Indirect indicator of neural activity

The BOLD response • Ratio of deoxy-hemoglobin to oxy-hemoglobin (different magnetic properties). • Also called hemodynamic response function (HRF). • Common working assumption: responses sum linearly.

More on BOLD response • At left is a typical BOLD response to a brief stimulation. • (Here, subject reads a word, decides whether it is a noun or verb, and pushes a button in less than 1 second.) Signal Amplitude Time (seconds)

Lots of features! … • 10,000-15,000 voxels per image

Study: Pictures and Sentences Press Button View Picture Read Sentence • 13 normal subjects. • 40 trials per subject. • Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. • Images are acquired every 0.5 seconds. Read Sentence Fixation View Picture Rest t=0 4 sec. 8 sec.

The star is not below the plus.

+ --- *

fMRI Summary • High-dimensional time series data. • Considerable noise on the data. • Typically small number of examples (trials) compared with features (voxels). • BOLD responses sum linearly.

It’s not hopeless! • Learning setting is tough, but we can do it! • Feature selection is key. • Learn fMRI(t,t+8)->{Picture,Sentence} Press Button View Picture Read Sentence Read Sentence Fixation View Picture Rest t=0 4 sec. 8 sec.

Results Subject: Accuracy: Subject: Accuracy: • Gaussian Naïve Bayes Classifier. • 95% confidence intervals per subject are +/- 10%-15%. • Accuracy of default classifier is 50%. • Feature selection: Top 240 most active voxels in brain.

Why is this interesting? • Cognitive architectures like ACT-R and 4CAPS predict cognitive processes involved in tasks, along with cortical regions associated with the processes. • Machine learning can contribute to these architectures by linking their predictions to empirical fMRI data.

Other Successes • We can distinguish between 12 semantic categories of words (e.g. tools vs. buildings). • We can train classifiers across multiple subjects.

What can’t we do? Press Button View Picture Read Sentence • Take into account that the responses for Picture and Sentence overlap. • What does the response for Decide look like and when does it start? Read Sentence Fixation View Picture Rest t=0 4 sec. 8 sec.

Motivation • Overlapping processes • The responses to Picture and Sentence could overlap in space and/or time. • Hidden processes • Decide does not directly correspond to the known stimuli. • Move to a temporal model.

Hidden Markov Models? t-1 t t+1 t+2 • Can’t do overlapping processes – states are mutually exclusive. • Markov assumption: given statet-1, statet is independent of everything before t-1. • BOLD response: Not Markov! CogProc {Picture, Sentence, Decide} fMRI

factorial HMMs? t-1 t t+1 t+2 • Have more flexibility than we need. • Picture state sequence should not be {0 1 0 1 0 1 0 1…} • Still have Markov assumption problem. Picture = {0,1} Sentence = {0,1} Decide = {0,1} fMRI

Hidden Process Models Name: Read sentence Process ID: 1 Response: Name: View Picture Process ID: 2 Response: Name: Decide whether consistent Process ID: 3 Response: Processes: Process ID = 1 Process ID = 1 Process Instances: Process ID = 2 View picture Process ID = 3 Decide whether consistent Observed fMRI: cortical region 1: cortical region 2:

HPM Parameters • Set of processes, each of which has: • a process ID • a maximum response duration R • emission weights for each voxel v [W(v,1),…,W(v,t),…,W(v,R)] • a multinomial distribution over possible start times within a trial [q1,…,qt,…,qT] • Set of standard deviations – one for each voxel [s1,…,sv,...,sV].

Interpreting data with HPMs • Data Interpretation (int) • Set of process instances, each of which has: • a process ID • a start time S • To predict fMRI data using an HPM and int: • For each active process, add the response associated with its processID to the prediction.

Synthetic Data Example Process 1: Process 2: Process 3: Process responses: ProcessID=1, S=1 Process instances: ProcessID=2, S=17 ProcessID=3, S=21 Predicted data

Our Assumptions • Processes, not states. • One hidden variable – process start time. • Known number of processes in the model. • e.g. Picture, Sentence, Decide – 3 processes • Known number of instantiations of those processes. • e.g. numTrials*3 processes • Each process has a unique signature. • Contributions of overlapping processes to the same output variable sum linearly.

The generative model • Together HPM and interpretation (int) define a probability distribution over sequences of fMRI images: P(yv,t|hpm,int) = N(mv,t,sv) where mv,t = S Wi.procID(v,t – start(i)) i Î active process instances

Inference • Given: • An HPM • A set of data interpretations (int) of processIDs and start times • Priors over the interpretations • P(int=i|Y) a P(Y|int=i)P(int=i) Choose the interpretation i with the highest probability.

Synthetic Data Example ProcessID=1, S=1 Interpretation 1: ProcessID=2, S=17 ProcessID=3, S=21 ProcessID=2, S=1 Interpretation 2: ProcessID=1, S=17 ProcessID=3, S=23 Observed data Prediction 1 Prediction 2

Learning the Model • EM (Expectation-Maximization) algorithm • E-step • Estimate a conditional distribution over the start times of the process instances given the observed data, P(S|fMRI). • M-step • Use the distribution from the E step to get maximum-likelihood estimates of the HPM parameters {q, W, s}.

More on the E-step • The start times of the process instances are not necessarily conditionally independent given the data. • Must consider joint configurations. • With no constraints, TnInstances configurations. • 2000120 configurations for typical experiment. • Can we consider a smaller set of start time configurations?

Reducing complexity • Prior knowledge • Landmarks • Events with known timing that “trigger” processes. • One per process instance. • Offsets • The interval of possible delays from a landmark to a process instance onset. • One vector of n offsets per process. • Conditional independencies • Introduced when no process instance could be active.

Before Prior Knowledge Read sentence Cognitive processes: View picture Decide whether consistent Observed fMRI: cortical region 1: cortical region 2:

Prior Knowledge Sentence Presentation Picture Presentation Landmarks: (Stimuli) Landmarks go to process instances. Offset values are determined by process IDs. Sentence offsets = {0,1} Picture offsets = {0,1} Read sentence Decide offsets = {0,1,2,3} Cognitive processes: View picture Decide whether consistent Observed fMRI: cortical region 1: cortical region 2:

Conditional Independencies Sentence Presentation Picture Presentation Sentence Presentation Picture Presentation Landmarks: (Stimuli) Sentence offsets = {0,1} Sentence offsets = {0,1} Picture offsets = {0,1} Picture offsets = {0,1} Read sentence Read sentence Decide offsets = {0,1,2,3} Decide offsets = {0,1,2,3} View picture View picture Decide whether consistent HERE Decide whether consistent Observed fMRI: cortical region 1: cortical region 2:

More on the M-step • Weighted least squares procedure • exact, but may become intractable for large problems • weights are the probabilities computed in the E-step • Gradient ascent procedure • approximate, but may be necessary when exact method is intractable • derivatives of the expected log likelihood of the data with respect to the parameters

HPM: picture or sentence? picture or sentence? Preliminary Results Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. 16 sec. GNB: picture or sentence? picture or sentence?

GNB vs. HPM Classification • GNB: non-overlapping processes • HPM: simultaneous classification of multiple overlapping processes • Average improvement of 15% in classification error using HPM vs GNB • E.g., for one subject • GNB classification error: 0.14 • HPM classification error: 0.09

Comprehend sentence Learned models Comprehend picture trial 25

Model selection experiments • Model with 2 or 3 cognitive processes? • How would we know ground truth? • Cross validated data likelihood P(testData | HPM) • Better with 3 processes than 2 • Cross validated classification accuracy • Better with 3 processes than 2

Current work and challenges • Add temporal and/or spatial smoothness constraints. • Feature selection for HPMs. • Process libraries, hierarchies. • Process parameters (e.g. sentence negated or not). • Model process interactions. • Scaling parameters for response amplitudes to model habituation effects.

One idea… Name: Riding bus Process ID: 1 Response: Name: Eating Process ID: 2 Response: Name: Walking consistent Process ID: 3 Response: Processes: ProcessID=3 Process instances: ProcessID=2 ProcessID=1 Observed data: Sensor 1: Sensor 2:

Some questions • What processes are interesting? • What granularity/duration would processes have? • What would landmarks be? • Variable process durations needed? • Better way to parameterize process signatures?

Hidden Process Models