1 / 98

Canadian Bioinformatics Workshops

Learn the standard workflow for metabolomics data analysis including data integrity checking, outlier detection, normalization, and scaling using MetaboAnalyst. This workshop by David Wishart covers both quantitative and chemometric methods.

edosch
Download Presentation

Canadian Bioinformatics Workshops

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canadian Bioinformatics Workshops www.bioinformatics.ca

  2. Module #: Title of Module 2

  3. Module 7 Metabolomic Data Analysis Using MetaboAnalyst David Wishart Informatics and Statistics for Metabolomics June 16-17, 2 014

  4. Learning Objectives • To become familiar with the standard metabolomics data analysis workflow • To become aware of key elements such as: data integrity checking, outlier detection, quality control, normalization, scaling, etc. • To learn how to use MetaboAnalyst to facilitate data analysis

  5. A Typical Metabolomics Experiment

  6. 25 PC2 20 15 ANIT 10 5 0 -5 Control -10 -15 PAP -20 PC1 -25 -30 -20 -10 0 10 ppm 7 6 5 4 3 2 1 TMAO creatinine hippurate allantoin creatinine taurine citrate urea hippurate 2-oxoglutarate water succinate fumarate ppm 7 6 5 4 3 2 1 2 Routes to Metabolomics Quantitative (Targeted) Methods Chemometric (Profiling) Methods

  7. Metabolomics Data Workflow • Data Integrity Check • Spectral alignment or binning • Data normalization • Data QC/outlier removal • Data reduction & analysis • Compound ID • Data Integrity Check • Compound ID and quantification • Data normalization • Data QC/outlier removal • Data reduction & analysis Chemometric MethodsTargeted Methods

  8. Data Integrity/Quality • LC-MS and GC-MS have high number of false positive peaks • Problems with adducts (LC), extra derivatization products (GC), isotopes, breakdown products (ionization issues), etc. • Not usually a problem with NMR • Check using replicates and adduct calculators MZedDB http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html HMDB http://www.hmdb.ca/search/spectra?type=ms_search

  9. Data/Spectral Alignment • Important for LC-MS and GC-MS studies • Not so important for NMR (pH variation) • Many programs available (XCMS, ChromA, Mzmine) • Most based on time warping algorithms http://mzmine.sourceforge.net/ http://bibiserv.techfak.uni-bielefeld.de/chroma http://metlin.scripps.edu/xcms/

  10. Binning (3000 pts to 14 bins) xi,yi x = 232.1 (AOC) y = 10 (bin #) bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8...

  11. Data Normalization/Scaling Same or different? • Can scale to sample or scale to feature • Scaling to whole sample controls for dilution • Normalize to integrated area, probabilistic quotient method, internal standard, sample specific (weight or volume of sample) • Choice depends on sample & circumstances

  12. Data Normalization/Scaling • Can scale to sample or scale to feature • Scaling to feature(s) helps manage outliers • Several feature scaling options available: log transformation, auto-scaling, Paretoscaling, probabilistic quotient, and range scaling MetaboAnalyst http://www.metaboanalyst.ca Dieterle F et al. Anal Chem. 2006 Jul 1;78(13):4281-90.

  13. Data QC, Outlier Removal & Data Reduction • Data filtering (remove solvent peaks, noise filtering, false positives, outlier removal -- needs justification) • Dimensional reduction or feature selection to reduce number of features or factors to consider (PCA or PLS-DA) • Clustering to find similarity

  14. MetaboAnalyst • Web server designed to handle large sets of LC-MS, GC-MS or NMR-based metabolomic data • Supports both univariate and multivariate data processing, including t-tests, ANOVA, PCA, PLS-DA • Identifies significantly altered metabolites, produces colorful plots, provides detailed explanations & summaries • Links sig. metabolites to pathways via SMPDB http://www.metaboanalyst.ca

  15. MetaboAnalyst Workflow

  16. GC/LC-MS raw spectra • Peak lists • Spectral bins • Concentration table • Spectra processing • Peak processing • Noise filtering • Missing value estimation • Row-wise normalization • Column-wise normalization • Combined approach Data integrity check Data input Data processing Data normalization Statistical Exploration Functional Interpretation Enrichment analysis Pathway analysis Time-series analysis Two/multi-group analysis • Over representation analysis • Single sample profiling • Quantitative enrichment • analysis • Enrichment analysis • Topology analysis • Interactive visualization • Data overview • Two-way ANOVA • ANOVA - SCA • Time-course analysis • Univariate analysis • Correlation analysis • Chemometric analysis • Feature selection • Cluster analysis • Classification Outputs Image Center Quality checking Other utilities • Resolution: 150/300/600 dpi • Format: png, tiff, pdf, svg, ps • Methods comparision • Temporal drift • Batch effect • Biolgoical checking • Peak searching • Pathway mapping • Name/ID conversion • Lipidomics • Processed data • Result tables • Analysis report • Images

  17. MetaboAnalyst Overview • Raw data processing • Using MetaboAnalyst • Data Reduction & Statistical analysis • Using MetaboAnalyst • Functional enrichment analysis • Using MSEA in MetaboAnalyst • Metabolic pathway analysis • Using MetPA in MetaboAnalyst

  18. Example Datasets

  19. Example Datasets

  20. Metabolomic Data Processing

  21. Common Tasks • Purpose: to convert various raw data forms into data matrices suitable for statistical analysis • Supported data formats • Concentration tables (Targeted Analysis) • Peak lists (Untargeted) • Spectral bins (Untargeted) • Raw spectra (Untargeted)

  22. Data Upload

  23. Alternatively …

  24. Data Set Selected • Here we will be selecting a data set from dairy cattle fed different proportions of cereal grains (0%, 15%, 30%, 45%) • The rumen was analyzed using NMR spectroscopy using quantitative metabolomic techniques • High grain diets are thought to be stressful on cows

  25. Data Integrity Check

  26. Data Normalization

  27. Data Normalization • At this point, the data has been transformed to a matrix with the samples in rows and the variables (compounds/peaks/bins) in columns • MetaboAnalyst offers three types of normalization, row-wise normalization, column-wise normalization and combined normalization • Row-wise normalization aims to make each sample (row) comparable to each other (i.e. urine samples with different dilution effects)

  28. Data Normalization • Column-wise normalization aims to make each variable (column) comparable in scale to each other, thereby generating a “normal” distribution • This procedure is useful when variables are of very different orders of magnitude • Four methods have been implemented for this purpose – log transformation, autoscaling, Pareto scaling and range scaling

  29. Normalization Result

  30. Quality Control • Dealing with outliers • Detected mainly by visual inspection • May be corrected by normalization • May be excluded • Noise reduction • More of a concern for spectral bins/ peak lists • Usually improves downstream results

  31. Visual Inspection • What does an outlier look like? Finding outliers via PCA Finding outliers via Heatmap

  32. Outlier Removal

  33. Noise Reduction

  34. Noise Reduction (cont.) • Characteristics of noise & uninformative features • Low intensities • Low variances (default)

  35. Data Reduction and Statistical Analysis

  36. Common tasks • To identify important features • To detect interesting patterns • To assess difference between the phenotypes • To facilitate classification or prediction • NOW ON YOUR OWN

  37. ANOVA

  38. View Individual Compounds

  39. Questions • Q: Which compounds show significant difference among all the neighboring groups (0-15, 15-30, and 30-45)? • Q: For Uracil, are groups 15, 30, 45 significantly different from each other?

  40. Overall correlation pattern

  41. High resolution image Specify format Specify resolution Specify size

  42. Question • Q: In untargeted metabolomics using NMR, researchers often look for region(s) on the spectra showing biggest change in their correlation patterns under different conditions. Can you do that in MetaboAnalyst? • Hint: check the available parameters of Correlation analysis

  43. Template Matching • Looking for compounds showing interesting patterns of change • Essentially a method to look for linear trends or periodic trends in the data • Best for data that has 3 or more groups

  44. Template Matching (cont.) Strong linear + correlation to grain % Strong linear - correlation to grain %

  45. Question • Q: Identify compounds that decrease in the first three groups but increase in the last group?

  46. PCA Scores Plot

  47. PCA Loading Plot Compounds most responsible for separation

  48. 3D-PCA

  49. Question Q: Identify compounds that contribute most to the separation between group 15 and 45

More Related