1.26k likes | 1.44k Views
Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 5 Metabolomic Data Analysis Using MetaboAnalyst. David Wishart Informatics and Statistics for Metabolomics June 15-16, 2015. Learning Objectives.
E N D
Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 5 Metabolomic Data Analysis Using MetaboAnalyst David Wishart Informatics and Statistics for Metabolomics June 15-16, 2015
Learning Objectives • To become familiar with the standard metabolomics data analysis workflow • To become aware of key elements such as: data integrity checking, outlier detection, quality control, normalization, scaling, etc. • To learn how to use MetaboAnalyst to facilitate data analysis
25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods
Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods
Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search
Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/xcms/
Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...
Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances
Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.
Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity
MetaboAnalyst http://www.metaboanalyst.ca A comprehensive web server designed to process & analyze LC-MS, GC-MS or NMR-based metabolomic data
MetaboAnalyst History • 2009 v1.0 - Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA, colorful plots, with detailed explanations & summaries • 2012 v2.0 - Identifies significantly altered functions & pathways • 2015 v3.0 – Better performance, better graphical interactivity, biomarker analysis, power analysis, integration with gene expression data …
MetaboAnalyst Overview • Raw data processing • Data reduction & statistical analysis • Functional enrichment analysis • Metabolic pathway analysis • Power analysis and sample size estimation • Biomarker analysis • Integrative analysis
Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)
Data Set Selected • Here we have selected a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows
Data Normalization Samples = rows Compounds = columns
Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)
Data Normalization • Column-wise normalization aims to make each variable (column) comparable in scale to each other, thereby generating a “normal” distribution • This procedure is useful when variables are of very different orders of magnitude • Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling
Data Normalization • You cannot know a priori what the best normalization protocol will be • MetaboAnalyst allows you to interactively explore different normalization protocols and to visually inspect the degree of “normality” or Gaussian behavior • This example is nicely normalized
Next Steps • After normalization has been completed it is a good idea to look at your data a little further to identify outliers or noise that could/should be removed
Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results
Visual Inspection • What does an outlier look like? Finding outliers via PCA Finding outliers via Heatmap
Noise Reduction (cont.) • Characteristics of noise & uninformative features • Low intensities • Low variances (default)
Common Tasks • To identify important features • To detect interesting patterns • To assess difference between the phenotypes • To facilitate classification or prediction • We will look at ANOVA, Multivariate Analysis (PCA, PLS-DA) and Clustering
ANOVA • Looking at 4 different dairy cow populations • 0% grain in diet • 15% grain in diet • 30% grain in diet • 45% grain in diet • Try to identify those metabolites that are different between all groups or just between 0% and everything else
ANOVA Click this to view the table Click this spot and the 3-PP graph pops up
View Individual Compounds Click this to see the uracil graphs
What’s Next? • Click and compare different compounds to see which ones are most different or most similar between the 4 groups • Click on the Correlation link (under the ANOVA link) to generate a heat map that displays the pairwise compound correlations and compound clusters
Overall Correlation Pattern Click this to save a high res. image
What’s Next? • When looking at >2 groups it is often useful to look for patterns or trends within particular metabolites • Use Pattern Hunter to find these trends
Pattern Matching • Looking for compounds showing interesting patterns of change • Essentially a method to look for linear trends or periodic trends in the data • Best for data that has 3 or more groups
Pattern Matching (cont.) Strong linear + correlation to grain % Strong linear - correlation to grain %