1 / 124

Canadian Bioinformatics Workshops

Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 5 Metabolomic Data Analysis Using MetaboAnalyst. David Wishart Informatics and Statistics for Metabolomics June 15-16, 2015. Learning Objectives.

tracey
Download Presentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module #: Title of Module 2

  3. Module 5 Metabolomic Data Analysis Using MetaboAnalyst David Wishart Informatics and Statistics for Metabolomics June 15-16, 2015

  4. Learning Objectives • To become familiar with the standard metabolomics data analysis workflow • To become aware of key elements such as: data integrity checking, outlier detection, quality control, normalization, scaling, etc. • To learn how to use MetaboAnalyst to facilitate data analysis

  5. A Typical Metabolomics Experiment

  6. 25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods

  7. Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods

  8. Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search

  9. Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/xcms/

  10. Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...

  11. Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances

  12. Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.

  13. Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity

  14. MetaboAnalyst http://www.metaboanalyst.ca A comprehensive web server designed to process & analyze LC-MS, GC-MS or NMR-based metabolomic data

  15. MetaboAnalyst History • 2009 v1.0 - Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA, colorful plots, with detailed explanations & summaries • 2012 v2.0 - Identifies significantly altered functions & pathways • 2015 v3.0 – Better performance, better graphical interactivity, biomarker analysis, power analysis, integration with gene expression data …

  16. MetaboAnalyst Overview • Raw data processing • Data reduction & statistical analysis • Functional enrichment analysis • Metabolic pathway analysis • Power analysis and sample size estimation • Biomarker analysis • Integrative analysis

  17. MetaboAnalyst Modules

  18. MetaboAnalyst Modules

  19. Example Datasets

  20. Example Datasets

  21. Metabolomic Data Processing

  22. Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)

  23. Select a Module (Statistical Analysis)

  24. Data Upload

  25. Alternatively …

  26. Data Set Selected • Here we have selected a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows

  27. Data Integrity Check

  28. Data Normalization Samples = rows Compounds = columns

  29. Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)

  30. Data Normalization • Column-wise normalization aims to make each variable (column) comparable in scale to each other, thereby generating a “normal” distribution • This procedure is useful when variables are of very different orders of magnitude • Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling

  31. Normalization Result

  32. Data Normalization • You cannot know a priori what the best normalization protocol will be • MetaboAnalyst allows you to interactively explore different normalization protocols and to visually inspect the degree of “normality” or Gaussian behavior • This example is nicely normalized

  33. Next Steps • After normalization has been completed it is a good idea to look at your data a little further to identify outliers or noise that could/should be removed

  34. Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results

  35. Visual Inspection • What does an outlier look like? Finding outliers via PCA Finding outliers via Heatmap

  36. Outlier Removal (Data Editor)

  37. Noise Reduction (Data Filtering)

  38. Noise Reduction (cont.) • Characteristics of noise & uninformative features • Low intensities • Low variances (default)

  39. Data Reduction and Statistical Analysis

  40. Common Tasks • To identify important features • To detect interesting patterns • To assess difference between the phenotypes • To facilitate classification or prediction • We will look at ANOVA, Multivariate Analysis (PCA, PLS-DA) and Clustering

  41. ANOVA • Looking at 4 different dairy cow populations • 0% grain in diet • 15% grain in diet • 30% grain in diet • 45% grain in diet • Try to identify those metabolites that are different between all groups or just between 0% and everything else

  42. ANOVA Click this to view the table Click this spot and the 3-PP graph pops up

  43. View Individual Compounds Click this to see the uracil graphs

  44. What’s Next? • Click and compare different compounds to see which ones are most different or most similar between the 4 groups • Click on the Correlation link (under the ANOVA link) to generate a heat map that displays the pairwise compound correlations and compound clusters

  45. Overall Correlation Pattern Click this to save a high res. image

  46. High Resolution Image

  47. What’s Next? • When looking at >2 groups it is often useful to look for patterns or trends within particular metabolites • Use Pattern Hunter to find these trends

  48. Pattern Matching • Looking for compounds showing interesting patterns of change • Essentially a method to look for linear trends or periodic trends in the data • Best for data that has 3 or more groups

  49. Pattern Matching (cont.) Strong linear + correlation to grain % Strong linear - correlation to grain %

More Related