370 likes | 493 Views
Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering. Theodore Alexandrov , Michael Becker, Sören Deininger , Günther Ernst, Liane Wehder , Markus Grasmair , Ferdinand von Eggeling , Herbert Thiele, and Peter Maass. Outline.
E N D
Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoisingand clustering Theodore Alexandrov, Michael Becker, SörenDeininger, GüntherErnst, LianeWehder, Markus Grasmair, Ferdinand von Eggeling, Herbert Thiele, and Peter Maass
Outline • Background on MS Imaging and goals of paper • Methods • Results • Conclusions and Criticism
Outline • Background on MS Imaging and goals of paper • Methods • Results • Conclusions and Criticism
Background: what is MS imaging? • In the words of All-MightyWikipedia: • Mass spectrometry imaging is a technique used in mass spectrometry to visualize the spatial distribution of e.g. compounds, biomarker, metabolites, peptides or proteins by their molecular masses. • Or in images:
Goals of this paper: • To propose a new procedure for spatial segmentation of MALDI-imaging datasets. • This procedure clusters all spectra into different groups based on their similarity. • This partition is represented by a segmentation map, which helps to understand the spatial structure of the sample.
Goal: in images… (it is MS Imaging after all)
Why? • Current multivariate algorithm (PCA) are not meant for MS data and cannot be used to directly interpret the data. • Current clustering algorithm do not take in account spatial information. • Here, we assume that spectra close to each other should be similar.
Outline • Background on MS Imaging and goals of paper • Methods • Results • Conclusions and Criticism
Datasets • Rat brain coronal section • 80 µm raster • 200 laser shots per position; 20185 spectra • Data acquired: 2.5 kDa-25 kDa • Data considered: 2.5 kDa-10 kDa; 3045 points • Section of neuroendocrine tumor (NET) invading the small intestine • 50 µm raster • 300 laser shots per position; 27360 spectra • Data acquired:1 kDa-30 kDa • Data considered: 3.2 kDa-18kDa; 5027 points
Spectra Preprocessing • Baseline correction • TopHat algorithm, minimal baseline width set to 10%, default in ClinProTools • No normalization • No binning • ASCII -> Matlab
Peak-Picking • Part1: conventional peak picking applied to each 10th spectrum. Select 10 peaks. • Orthogonal Matching Pursuit (OMP) because it is fast and simple • Gaussian kernel deconvolution • Part 2: keep consensus peaks: • Only keep peaks that appear in at least 1% of the considered spectra • Omit spurious peaks
Edge-preserving denoising of m/z images • Imaging dataset is a reduced datacube with 3 coordinates: x, y, m/z (reduced in m/z dimension by peak picking) • MALDI-imaging data is noisy • Must be able to keep fine anatomical or histological details • Grasmair modification of Total Variation minimizing Chambolle algorithm • Parameter θ between 0.5 and 1: smoothness of resulting image
Edge-preserving denoising of m/z images • Total variation (TV) ~ sum of absolute differences between neighboring pixels • Chambolle algorithm searches for an approximation of the image with small TV • Chambolle algorithm => smoothness adjusted globally by manually choosing a parameter • Grasmair locally adapts denoising parameter of Chambolle
Clustering • Specify number of cluster a-priori • High Dimensional Discriminant Clustering (HDDC) • Available in Matlab tool box • Each cluster is modeled by a Gaussian distribution of its own covariance structure. • HDDC developed for high-dimensional data (d > 10) • Note: In Matlab HDDC = high-dimensional data clustering
Outline • Background on MS Imaging and goals of paper • Methods • Results • Conclusions and Criticism
Rat brain: peak picking • used 2019 spectra out of 20185 (10%) • potential peaks: 373 peaks (red triangles) • consensus peaks: 110 peaks (green triangles) • Present in at least 20 spectra out of the 2019 (1%) • Discarded peaks mostly in low m/z regions • Hypothesize they are noise peaks because MALDI imaging spectra have high baseline in low m/z region.
Rat brain: peak picking • OMP successfully detects major peaks • Gaussian function provides reasonable approximation of peak shape
Rat brain: noise in MALDI-imaging • Strong noise • Noise variance changes within m/z image and between m/z images • Noise variance is linearly proportional to peak intensity
Edge-preserving denoising • Apply Grasmair method to selected 110 consensus peaks • Efficiently removes the noise while not smoothing out edges
Rat brain: segmentation map • Shows anatomical features • Restricted to spatial resolution of MALDI-imaging dataset
Rat brain: importance of edge-preserving denoising • No denoising: borders do not match as well • 3x3 median smoothing: bad edge preservation • 5x5 median smoothing: lose many regions
Rat brain: co-localized masses • Find mass values expressed in region
Rat brain: the role of parameterspeak picking • 3 main parameters in addition to peak width • Portion of spectra considered for peak picking (each 10th spectrum) • Number of peaks selected for each spectrum (10 peaks) • Percentage of spectra where peak is found for consensus peak list (1%)
Rat brain: the role of parameterspeak picking • Robust to changes of second and third parameter 5 10 20 peaks 0.1% 1% 5%
Rat brain: the role of parameterspeak picking • Increase of parameter 1 can be compensated by higher value for parameter 2 Each 20th spectrum Each 5th spectrum
Rat brain: the role of parametersdenoising and number of clusters • Segmentation maps for • 3 levels of denoising (0.6, 0.7, 0.8) • 3 number of clusters (6, 8, 10) • Decrease in number of clusters merge features • Too much denoising causes loss of structure details
Rat brain: the role of parametersdenoising and number of clusters
Outline • Background on MS Imaging and goals of paper • Methods • Results • Conclusions and Criticism
Conclusions • Peak picking: usually done on mean spectrum • 1% consensus better for peaks in small spatial area • Edge-preserving denoising • One study with average moving window and one study posthoc to improve classification • Clustering methods • HDDC better results than k-means but significantly slower • Currently, mostly hierarchical clustering = memory intensive • Importance to cancer studies • Represents a proteomic functional topographic map
Criticism • Didn’t explain why they got rid of part of the range for which the data was acquired • Dataset reduction by peak picking • done initially on per spectrum basis, it may get rid of lower abundance peaks which still show interesting image • Also, because the peak must be present in 1% of the 10% selected spectra, can miss smaller regions of interest if bad selection of 10% • Highly parameterized + slow running time would make it hard to run many trials