390 likes | 497 Views
Returning Back … . A Big Thanks Again . Prof. Matt Hibbs Jackson Labs. Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University. modes. s.v.’s. modes. x. x. genes. M ≈. “weight”. voxels. spatial pattern. gene pattern. Large-scale Correlation.
E N D
A Big Thanks Again Prof. Matt Hibbs Jackson Labs Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University
modes s.v.’s modes x x genes M≈ “weight” voxels spatial pattern gene pattern Large-scale Correlation • Quality control → set of 3041 genes • Combine gene volumes into a large matrix • Decompose the voxel x gene matrixusing singular value decomposition (SVD)
Principal modes (SVD) Cerebral cortex Olfactory areas Hippocampus Retrohippocampal Striatum Pallidum Thalamus Hypothalamus Midbrain Pons Medulla Cerebellum All LH brain voxels plotted as projections on first 3 modes N=271 before we get to 90% of the variance N=67 before we get to 80% of the variance
Interpreting gene modes • Spatial modes are easily visualized. Attempt to annotate eigenmodes using Gene Ontology (GO) annotations: • Each GO term partitions gene list into two subsets: • IN genes: Genes annotated by that GO term • OUT genes: Genes not annotated by that GO term • Each singular vector associates each subset above with a set of amplitudes • Compare these amplitudes, asking whether ‘IN’ genes have larger magnitudes than ‘OUT’ genes • use K-S test to test whether the amplitude distributions are different
In this low dimensional space • Cerebellum and striatum separated - GABAergicinterneurons and glutamatergic projection neurons in adult mouse forebrain • Other regions are clustered in greatly reduced space, but with considerable overlap • Anatomical regions do not in general correspond directly to individual SVD modes • Clustering of gene expression profiles in very low dimensional subspace groups voxels drawn from same brain regions
Component Annotations Distinctly high amplitude in the dentate gyrus of the hippocampus. Enhanced specificity for the cerebellum, Particularly prominent in the cerebellum and the striatum. Decomposition extracts correlated structure in expression profiles that corresponds to anatomical subdivision
Gene clustering? Genes are somewhat less separable - and less categorical Build gene-gene similarity graph partition, color code each point…
K-Means Segmentation • What does gene expression tell about regional brain organization ? • Use simple cluster analysis. • K-means clustering: • Dimensionality reduced (to 271) by truncating SVD • Assign one of K labels to each voxel • All voxels assigned the same label have more similar expression profiles than voxels with different labels • Similarity defined by Euclidean distance • Data-driven parcellation of mouse brain anatomy (level of granularity determined by K)
Spatially Contiguous Clusters K = 16 - thalamus has its own cluster; cortical layers further differentiated, midbrain separated from hindbrain K=2 – clusters separates cerebral cortex hippocampus (gray) from other areas (white) Large K – More anatomical details observed; separation of caudoputamen from the nucleus accumbens; display laminar and areal patterns in cortex K = 8 – cerebellum/striatum clearly segmented, cortex is subdivided into distinct layers
Clustering in Cerebral Cortex Laminar clusters broken into distinct groups along anterior–posterior direction (bottom) at border between auditory & somatosensory areas K = 40 (masked) ARA Area masks Divides aud/vis areas from somatosensory areas Validation
Relevant Questions • Determine, for a given structure, at what value of K it emerges as its own cluster ? • Relative prioritization of anatomical areas based on expression pattern similarity • Dominant clustering of gene expression along cortical layers consistent with those of Ng et al.
Compare with Reference Atlas Reference atlases here are “flat” parcellations with 12 or 94 regions Similarity index (S) Overlap saturating at K > 30 Clusters for large K are subdivisions of those for low K ranges from 0-1
Compare with Reference Atlas Clusters 1, 2, 3, and 4 together the cerebral cortex Cluster 11 largely corresponds thalamus Cluster 9 is wholly contained in the cerebellum Cluster 10 in the striatum. K=12
Classification of Region Membership 94.5% correct overall • Supervised learning using linear discriminant (25% test set, 10-fold cross-validation)
What Next ? • Size of voxels large relative to individual cell bodies • Voxels will contain a mixture of several cell types. • Unique expression signature for discrete brain locations with different combinations of cell types. • Spatial co-expression indicator of functionally-related or interacting genes
Localization of expression Normalized Expression Energy Non-localized expression pattern Well-localized expression pattern Voxels Kullback-Leibler (KL) divergence from (spatial) uniformity
Gene Localization summed thresholded • Select most localized genes (KL > ~1.56) to further analyze • Threshold voxels based on intensity histogram of summed expressions • Remaining LH mask (6102 voxels) essentially excludes cerebral cortex
Voxel Uniformity in Gene Space Measure KL divergence from uniform density across gene space at each voxel Brighter color indicates lower KL divergence (more uniform expression across genes) Note cortex is generally more uniform than subcortical areas And middle cortical layers are notably more uniform than superficial and deepest layers
“Expression diversity” Average KL divergence across all voxels in a particular anatomical region Expression diversity across gross structures Expression diversity across cortical layers and areas
Biclustering Genes & Voxels GENES VOXELS Can we group genes that are each highly localized to common brain regions (sets of voxels)? Construct a bipartite graph with N (200) genes in vertex set V1 and M (~6000) mask voxels in V2 • Edges are expression levels of each gene at each voxel Apply graph partitioning methods to cut graph into connected components • Components contain both voxels and genes • Here we used the isoperimetric algorithm (Grady and Schwartz, 2006). V1 V2
What is Biclustering ? Finding submatrices in an n x m matrix that follow a desired pattern* Row/column order need not be consistent between different biclusters.
Bicluster properties Biclustering of Expression data: Cheng and Church, RECOMB 2001 For any submatrix CIJ where I and J are a subsets of genes and conditions, the mean squared residude score is A bicluster is a submatrix CIJ that has a low mean squared residue score.
Cheng and Church • Greedy Approach • Finds a submatrix that minimizes MSR • Biclusters (a) and (b) fits the definition of MSR
Biclustering Localized Genes Resulting voxelclusters correspond well to individual anatomical regions, w/ functionally relevant gene lists 97% of energy in the cerebellum 40 genes Highly localized to ventricle system 29 genes
Biclustering Localized Genes Results shown are for 13 biclusters 69% of energy in dentate gyrus, 20% Ammon’s horn 30 genes 99% of energy in thalamus 11 genes
Cell-type expression model • Hypothesis: do genes emerging from these biclusters represent preferential “markers” of cell types localized to the corresponding regions? • Cell-type specific microarray data are available (Okaty et al., 2009; 2011) to help answer this question • Compare microarray profiles of these cell types with voxel-based transcriptomic data from ABA • 2131 overlapping genes (with high quality ABA data)
Cell-type based expression • Spatial patterns reflect organization within brain regions A B C • Granule cells (B) Purkinje cells (C) Stellate cells • (D) mature oligodendrocytes D
Biclusters Cell Types • Highly localized genes emerging from bi-clusters (usually) show selective expression in local cell types CP bi-cluster Cb bi-cluster
Heritable “Disease Networks” • Online Mendelian Inheritance in Man (OMIM) • Contains records of genetic basis for ~4000 disorders • Manually curated 94 unique entities that are of neurological / neuropsychiatric interest and intersect our gene set • For each disorder, calculate the mean expression pattern across orthologsofimplicated genes (MGI orthology) • Calculate a distance matrix between disorders by computing the pairwise cosine distance between expression profiles • Cluster disorders using hierarchical cluster analysis
OMIM Disease Clusters Complete linkage clustering
Autism Candidate MapT Cb Ctx Fgd3 Lhx1 Doc2a Ptpdc1 • For a given gene list, embed expression similarity in 2D space • Ex: ASD candidate genes from Wigler lab (CSHL) • (16 genes in high quality coronal data set) • Calculate cosine distance matrix, and apply metric MDS • Provide sub-groupings based on expression locus
Next ? . . . TR time Spatial components 1 1 0.5 2 Component 0 3 -0.5 4 -1 fMRI 0 2 4 6 8 10 12