930 likes | 948 Views
Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 3 Metabolite Identification and Annotation – Part II. David Wishart Informatics and Statistics for Metabolomics May 3-4, 2012. ppm. 7. 6. 5. 4. 3. 2. 1. Goal of Metabolite Annotation.
E N D
Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 3 Metabolite Identification and Annotation – Part II David Wishart Informatics and Statistics for Metabolomics May 3-4, 2012
ppm 7 6 5 4 3 2 1 Goal of Metabolite Annotation
Metabolite ID by Spectral Deconvolution (NMR) Mixture Compound A Compound B Compound C
Alternatives to Chenomx • AMIX (Bruker) • AutoFit (automated fitting) • MetaboMiner (2D NMR) • HMDB (NMR spectral match) • PRIMe Spin Assgn (NMR spectral matching server) • rNMR and BRMB Peaks Server • CCPN-MP
Performance of Autofit Synthetic Real P. Mercier et al. J Biomol NMR. 2011 Apr;49(3-4):307-23
NMR Compound ID from Mixtures - MetaboMiner ID’d Compounds Raw TOCSY Spectrum http://wishart.biology.ualberta.ca/metabominer/
Standard reference libraries 225 TOCSY spectra 488 HSQC spectra Specialized sub-libraries for CSF, plasma and urine Algorithms for automatic processing & compound identification “Minimal signature peaks” 1D 1H peak list as sanity check Extra dimensional information for identification Support for direct spectral annotation MetaboMiner Software Design
NMR Compound ID - HMDB Peak list to HMDB NMR spectrum of mixture Phenyllactate Phenylpyruvate Phenylacetic acid Tropic acid Benzyl alcohol … http:///www.hmdb.ca High scoring matches
PRIMe Spin Assign http://prime.psc.riken.jp/?action=nmr_search
rNMR http://rnmr.nmrfam.wisc.edu/
BMRB Peaks Server http://www.bmrb.wisc.edu/metabolomics/query_metab.php
CCPN - MP http://www.ccpn.ac.uk/ccpn/projects/metabolomics/
Metabolite ID by GC-MS GC -MS total Ion chromatogram
Recall EI MS Generates Multiple Peaks Molecular ion EI Breaks up Molecules in Predictable Ways
Recall GC-MS Analytes are Derivatized Methoxime
Metabolite ID by GC-MS • GC-MS is often best for identification of amino acids, organic acids, sugars, fatty acids and molecules with MW<500 • GC has higher resolution and reproducibility than LC • EI-MS is more standardized than soft ionization methods, so EI spectra are more comparable • Most common route is to use AMDIS + NIST database
NIST 11 MS Database • 243,893 EI spectra of 212,961 cmpds • 9934 ion trap MS for 4649 cmpds • 91,557 Qtof & QqQ spectra for 3774 compounds • 224,038 RI values for 21,847 cmpds
AMDIS (Automated Mass Spectral Deconvolution and Identification System) • Noise analysis • Determines background noise level • Component perception • Identifies peaks by comparing to noise • Spectral deconvolution • Generates a “clean” or model spectrum • Compound identification • Identifies compounds via a library search using a match factor
Match Factor (MF) • Measures the similarity of the MS spectrum of the query to the MS spectrum in the reference database • Defined as the normalized dot product of the query and the reference spectra Iref corresponds to the intensities of the reference spectra, Iqry corresponds the intensities of the query spectra, M corresponds to the masses (m/z) w is a weighting term to penalize uncertain peaks
GC-MS Protocol • Prepare a set of external n-alkane standards (8-9 n-alkanes spanning octane to hexadecane) and run as an external calibration standard • Run a “blank sample” containing just the solvent and derivatization agents • Run the sample of interest (under the same conditions as the blank)
GC-MS Protocol External n-alkane standard used for RI calculation
GC-MS Protocol • Create a calibration file using the n-alkane mixture (sets retention indices [RI’s] to the standard values) • Analyze the sample data file against the CAL(calibration)-file for the alkane mixture (sets and recalculates RI's using the n-alkanes) • Search the NIST database for matches and displaying the results of the search • Get rid of “false” positives by comparing the “blank” against the sample spectrum
Step 3 – Search NIST Database for Matches GC Peak List AMDIS EI-MS Spectrum For 11.597
Step 3 – Search NIST Database for Matches (Zero in) 73 & 144 are 2 most abund. m/z Peak Spectrum MF = 84% Match To Valine Reference Spectrum Match factor ³ 60% (if in doubt compare “blank” and your signal)
Other GC-MS Options • Alternatives to AMDIS • AnalyzerPro (SpectralWorks) • ChromaTOF (Leco) • Evaluated in TrAC Trends in Analytical Chemistry Volume 27, Issue 3, March 2008, Pages 215-227 • Alternatives to NIST08 or NISTII • Golm Database (Open access) • FiehnLib (Leco, Agilent) • HMDB???
The Golm Database • GC-MS (Quad and TOF) database • Contains MSRI (MS + retention index) or MST data for 1450 identified metabolites • Includes 10,336 spectra linked to analytes • Downloadable libraries compatible with NIST08 and AMDIS software • Primary focus on plant metabolites • Supports compound name and MS queries • MS submissions via NIST08 or AMDIS format
Golm Database http://gmd.mpimp-golm.mpg.de/
The FiehnLib GC-MS Database • 2212 EI MS and RI data for quadrupole &TOF GC-MS • Over 1000 primary metabolites below 550 Da • Covers lipids, amino acids, fatty acids, amines, alcohols, sugars, amino-sugars, sugar alcohols, sugar acids,, and sterolsphosphates, hydroxyl acids, purines
Metabolite ID by LC-MS LC -MS total Ion chromatogram
Levels of Metabolite Identification in MS • 4 levels of metabolite identification • Positively identified compounds • Confirmed by match to known standard • Putatively identified compounds • Match to MS + RT or MS/MS + RT • Compounds putatively identified in a compound class • Unknown compounds
Metabolite ID by LC-MS • LC-MS is often best for identification of lipids, bases, amino acids, organic acids, fatty acids and other somewhat hydrophobic molecules • Metabolite ID typically requires both MS and MS/MS data (along with retention time information) and internal standards • Compound ID can be done by high accuracy mass matching and/or by MS/MS matching to spectral databases
Simple MW Search DBs ChEBI (www.ebi.ac.uk/chebi/) PubChem (http://pubchem.ncbi.nlm.nih.gov/) ChemSpider (www.chemspider.com) HMDB (www.hmdb.ca)
PubChem MW Search Available Under “Advanced Search”
ChEBI MW Search http://www.ebi.ac.uk/chebi/advancedSearchForward.do
Advanced MS Search DBs NIST/AMDIS (http://chemdata.nist.gov) Metlin (http://metlin.scripps.edu/) HMDB (www.hmdb.ca) MassBank (www.massbank.jp)
Advanced MS Search DBs • These databases support not only MW or MW range searches, but also support parent ion searches (positive, negative, neutral), peak list searches (from MS or MS/MS data) as well as MS/MS spectral matching • These DBs are intended more for MS-based metabolomics and compound ID than the simple MW search tools
MS Compound ID - HMDB Peak list to HMDB LC-MS Spectrum Phenyllactate Phenylpyruvate Atrolactic acid Homovanillin Coumaric acd http:///www.hmdb.ca High scoring matches
MS Compound ID - HMDB • Database of ~100,000 predicted masses from ~10,000 known metabolites • Includes adduct mass calculations for 30+ possible or expected metabolite adducts • Allows selection of different databases (DrugBank, HMDB, FooDB, T3DB), mass tolerance and ionization mode • Designed for mixture deconvolution (i.e. identification of multiple compounds at a time)