Medical Imaging Informatics Lecture #10: Clinical Perspective: Single Subject Classification

Medical Imaging Informatics Lecture #10: Clinical Perspective: Single Subject Classification Susanne Mueller M.D. Center for Imaging of Neurodegenerative Diseases Dept. of Radiology and Biomedical Imaging susanne.mueller@ucsf.edu

Overview • Single Subject Classification/Characterization: Motivation and Problems • Bayesian Networks for Single Subject Classification/Characterization

1 .Single Subject Classification/Characterization

Quantitative Neuroimaging: Group Comparisons Posttraumatic Stress Disorder Temporal Lobe Epilepsy Major Depression • Implicit assumptions of Group Comparisons: • Abnormal regions are relevant/characteristic for the disease process. • Abnormalities present in all patients, i.e., subject showing abnormalities with disease specific distribution is likely to have the disease.

Quantitative Neuroimaging: Do the Assumptions hold up? Posttraumatic Stress Disorder Temporal Lobe Epilepsy Major Depression

Motivation • Identification of different variants and/or degrees of the disease process. • Translation into clinical application.

Requirements • Identification and extraction of discriminating feature: • - Single region. • - Combination of regions. • 2. Definition of a threshold for “abnormality”. Goal: High sensitivity and specificity.

Sensitivity and Specificity: Definitions I Sensitivity: Probability that test is positive if the patient indeed has the disease. P (Test positive|Patient has disease) Test ideally always detects disease.

Sensitivity and Specificity: Definitions II Specificity: Probability that test is negative if the patient does not have the disease. P (Test negative|Patient does not have disease) Test ideally detects only this disease and not some other non-disease related state/other disease

Sensitivity and Specificity Sensitivity and specificity provide information about a test result given that the patient’s disease state is known. In clinic however the patient’s disease state is unknown and this is why the test was done in the first place. => positive and negative predictive value of the test

Positive and Negative Predictive Value: Definition Positive predictive value (PPV): P (Patient has disease|Test positive) Negative predictive value (NPV): P (Patient does not have disease|Test negative

Example Sensitivity: 0.80 Specificity: 0.94 PPV: 0.90 NPV: 0.86

Receiver Operator Curve: ROC Sensitivity and Specificity are good candidates to assess test accuracy. However, they vary with the threshold (test pos/test neg) used. ROC is a means to compare the accuracy of diagnostic tests over a range of thresholds. ROC plot sensitivity vs 1- specificity of the test.

EXAMPLE: ROC High threshold Good specificity: 0.92 Medium sensitivity: 0.52 Medium threshold Medium specificity: 0.7 Medium sensitivity: 0.83 sensitivity Low threshold Low specificity: 0.45 Good sensitivity: 0.95 Extreme low threshold No specificity: 0 Perfect sensitivity: 1 1-specificity

Example: ROC II Chance line Example ROC Optimal threshold indicated by arrow ROC of good test Approaches the left corner of ROC

Feature Definition: Information extracted from image. Usefulness of a feature to detect the disease is determined by 1. Convenience of measurement. 2. Accuracy of measurement. 3. Specificity for the disease (e.g.:CK-MB) 4. Number of features (single < several/feature map)

Features and Thresholds used in Imaging for Single-Subject Analyses I A. Single feature = Region of interest (ROI) analysis Previous knowledge that ROI is affected by the disease comes either from previous imaging studies or from other sources, e.g. histopathology Approaches used to detect abnormality for ROI analyses: z-scores: z = (xs – mean xc)/SDc t-scores*: t = (xs – mean xc)/ SDc*(n+1/n)1/2) Bayesian estimate**: z* = (xs – mean xc)/q1/2 Crawford and Howell 1998*; Crawford and Grathwaite 2006**

Example: ROI Analyses and Thresholds Hippocampal volumes corrected for intracranial volume obtained from T1 images of 49 age matched healthy controls (mean: 3.92±0.60) and hippocampal volume of a patient with medial temporal lobe epilepsy 3.29 z- score: -1.05 corresponds to one-tailed p = 0.147 t – score: -1.04 corresponds to one-tailed p = 0. 152 Bayesian one-tailed probability : 0.152 , i.e. 15% of the control hippocampal volumes fall below the patient’s volume

Features and Thresholds used in Imaging for Single-Subject Analyses II B. Multiple features from the same source = map that encodes severity and distribution of the disease associated abnormality. Previous knowledge about the distribution/severity of the abnormalities is not mandatory to generate “abnormality” map, i.e., typically whole brain search strategy is employed. However, previous knowledge can be helpful for the correct interpretation. Approaches used to generate abnormality maps : z- score maps (continuous or thresholded) Single-case modification of the General Linear Model used for group analyses.

Features and Thresholds used in Imaging for single-subject analyses III • Problems: • Difference reflects normal individual variability rather than disease effects. • Assumption that single subject represents the mean of a hypothetical population with equal variance as observed in the control group • Higher number of comparisons (multiple ROI/voxel-wise) require: • a. correction for multiple comparisons. • b. Adjustment of result at ROI/voxel level for results in immediate neighborhood, e.g. correction at cluster level • 4. Interpretation of resulting maps

Influence of Correction for Multiple Comparison Increase FWE p <0.05 Decrease Increase FWE p <0.01 Decrease Increase FWE p <0.001 Decrease Scarpazza et al. Neuroimage 2013; 70: 175 -188

Interpretation of Single Subject Maps • Potential strategies for map interpretation: • Visual inspection using knowledge about typical distribution of abnormalities in group comparisons. • Quantitative comparison with known abnormalities in group comparisons, e.g. calculation of Dice’s co-efficient for whole map. • Problems: • Requires existence of “disease typical pattern”. • Requires selection of “threshold” indicating match with typical pattern or not. • Difficulties to interpret severe abnormalities that do not match typical pattern. Atypical representation? Different disease?

Examples Gray matter loss in TLE compared to controls

2. Bayesian Networks for Single Subject Classification/Characterization

Characteristics of an Ideal Classification System • Uses non-parametric, non-linear statistics. • Identifies characteristic severe and mild brain abnormalities distinguishing between two groups based on their spatial proximity and strength of association with clinical variable (e.g. group membership) • Weights abnormal regions according to their ability to discriminate between two groups. • Provides probability of group membership and objective threshold based on based on congruence of individual abnormalities with group specific abnormalities. • Uses expert a priori knowledge to combine information from different sources (other imaging modalities, clinical information) for the determination of the final group membership.

Bayesian Networks: Basics Definition: Probabilistic graphical model defined as: B = (G, Q) G isdirected acyclic graph (DAG) defined as G = (n, e) where n represents the set of nodes C in the network and e the set of directed edges that describe the probabilistic association among the nodes. Q is the set of all conditional probability states q that the nodes in the network can assume.

Bayesian Networks: Basics: Simple Network DAGG Joint Probability Distribution Q Event B Event A

Bayesian Networks: Basics: Slightly more complex Network Event C Event A Event B

Bayesian Networks: Basics: It is getting more complicated A F B C E D I (V, Parents (V), Non-Descendents) V = any variable in the DAG Markovian assumptions of the DAG

Bayesian Networks: Basics: It is getting more complicated A B C E D

Bayesian Networks: Inference I: Probability of Evidence Query A B C E D True True Prob: 0.30

Bayesian Networks: Inference II: Prior and Posterior Marginal Query Definition: Marginal: projection of the joint distribution on a smaller set of variables If joint probability distribution is Pr(x1,….,xn), then marginal distribution Pr(x1,….,xm), m≤n is defined as: Pr(x1,….,xm) = S Pr(x1,…,xn) Xm+1,….,xn prior marginal True = 0.60, False = 0.4 True = 0.92, False = 0.08 posterior marginal A B C True =0.52, False = 0.48 True=0.42, False = 0.58 True =1.0, False = 0.00 True=0.24, False = 0.76 E D True = 0.70, False = 0.3 True =0.36, False = 0.64 True = 0.84, False = 0.16 Evidence = True

Bayesian Networks: Inference III: Most Probable Explanation (MEP) and Maximal a posteriori Hypothesis (MAP) Definition: MEP = Given evidence for one network variable, instantiation of all other network variables for which probability of the given variable is maximal MAP = Given evidence for one network variable, instantiation of a subset of network variables for which probability of the given variable is maximal A B C E D Evidence mpe: D = true Evidence mep: D = true

Bayesian Networks: Inference IV: Different algorithms have been developed to update the remaining network after observation of other network variables. Examples for exact inference algorithms: Variable or factor elimination Recursive conditioning Clique tree propagation Belief propagation Examples for approximate inference algorithms: Generalized belief propagation Loopy belief propagation Importance sampling Mini-bucket elimination

Bayesian Networks: Learning I: Parameter/Structure A B C E D

Bayesian Networks: Learning II: Parameter Learning 1. Expert Knowledge 2. Data driven a. Maximum likelihood (complete data) b. Expectation maximization (incomplete data). c. Bayesian approach Structure Learning 1. Expert Knowledge 2. Data driven: a. Local search approach b. Constraint based approach: Greedy search (K2, K3), optimal search c. Bayesian approach

Bayesian Networks: Application to Image Analysis? YES 1. Identification of features distinguishing between groups. 2. Combination of different distinguishing imaging features., e.g., volumetric and functional imaging.

Bayesian Network: Basics: Feature Identification I Characterization of the problem 1. Parameter and structure learning. a. Representative trainings data set b. Information reduction: c. Definition of network nodes : d. Definition of possible node states. e. Calculation of the strength of association between image feature and variable of interest 2. Network query. a. Calculation of group affiliation based on concordance with feature set that had been identified during the learning process. Preparatory steps Structure learning Parameter learning

GAMMA: Graphical Model-Based Morphometric Analysis* Bayesian Network: Basics: Feature Identification II Chen R, Herskovits E. IEEE Transactions on Medical Imaging: 2005; 24: 1237 – 1248

GAMMA: Preparatory Steps I 1. Identification of trainings set: Images patients and controls or subjects with and without the function variable of interest for the Bayesian network. Representative for population, i.e., encompasses the variability typically found in each of the population

GAMMA: Preparatory Steps II 2. Data Reduction Use of prior knowledge regarding the nature of the feature, e.g., reduction of information in image to regions with relative volume loss if disease is associated with atrophy. Creation of binary images: Each individual image is compared to a mean image and voxels with intensities below a predefined threshold, e.g. – 1 SD below control are set to 1, other voxels to zero

GAMMA: Preparatory Steps II Data Reduction Binarized Control (1SD below mean) Mean Original Control Binarized Patient (1SD below mean) Original Patient SD Each binary map can be represented as follows: {F, V1, V2, V3…Vm} where F represents the state, i.e. patient or control and Vi represents the voxel at location i. Given the above definition, a voxel Vi with the value 1 means that there is a volume loss Choice of images to generate mean/SD image and threshold for binarization are crucial for performance

GAMMA: Structure Learning Theoretical Steps. 1. Generate Bayesian Network that identifies the probabilistic relationship among {Vi} and F. 2. Generate cluster(s) of representative voxels (R, output: label map) such that all voxels in a cluster have similar probabilistic associations with F (output: belief map). All clusters are independent from each other and each cluster corresponds to a node.

GAMMA: Structure Learning Practical I Step 1 a. Definition of search space V, e.g., all voxels where at least one subject has a value that differs from every other subject’s value for that voxel. b. Identification of the first search space voxel(s) that provide optimal distinction between states F, e.g. all controls 0, all patients 1. Assign voxel to putative group of representative voxels A.

GAMMA: Structure Learning Practical I . . . . . . Group A n=10, “Controls” Group B n=10 , “Patients” Disease characterized by atrophy or “1” voxels compared to controls search space representative voxels 1st iteration

GAMMA: Structure Learning Practical II Step 1 cont. c. Identification of voxel(s) whose addition to A increases the ability of A to correctly distinguish between states F. Process is repeated until there is no voxel left that fulfills that condition. d. Identification of all those voxels Rn in A that maximize the distinction between states F. The Rn of the first iteration corresponds to R. (The Rn after the first iteration are added to R). Voxels belonging to Rn are removed from search space V.

GAMMA: Structure Learning Practical II

GAMMA: Structure Learning Practical III Step 2 (iteration 2 and higher) a. Calculation of similarity s between voxels in A and voxels in Rn-1. Similarity s for one voxel Vi in A is defined as s(Vi,Rn-1)= P( Vi=1, Rn-1= 1) + P(Vi = 0, Rn-1 =0) The similarity for all n voxels in A is expressed as a similarity map S S = {s(Vi,Rn-1), s(Vj,Rn-1)….s(Vn,Rn-1)}.

GAMMA: Structure Learning Practical III

Medical Imaging Informatics Lecture #10: Clinical Perspective: Single Subject Classification