310 likes | 321 Views
Welcome to the WCMC Metabolomics Course 2013, where Tobias Kind teaches about the close relationship between molecular structure and mass spectra in the field of cheminformatics. Learn about the important databases, search algorithms, and techniques used in mass spectral analysis.
E N D
Biology Chemistry Welcome! Mass Spectrometry meets ChemInformatics WCMC Metabolomics Course 2013 Tobias Kind Course 3: Mass spectral and moleculardatabase search Informatics http://fiehnlab.ucdavis.edu/staff/kind CC-BY License
Molecules and mass spectra Close relationship between molecular structure and mass spectra Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations) Mass spectra reflect a state of gas phase ion physics and chemistry (rearrangements, fragmentations, bond cleavages) Electron ionization (70 eV) mass spectra; Source: NIST05
Molecules and mass spectra Similar structures may or may have not similar mass spectra Electron ionization (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program
Molecules and mass spectra Similar mass spectra may or may have not similar structures Electron ionization (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program
Large mass spectral databases Name Spectra count Type NIST11 ($$) 244,000 electron ionization (EI 70 eV) Wiley 10 ($$) 719,000 electron ionization (EI 70 eV) MassBank (free) 12,000 electron ionization (EI 70 eV) NIST12 MS/MS ($$) 95,400 MS/MS (ESI, +/-, 30-100V CID) METLIN (open, $$) 57,972 MS2 (30-100V CID, QTOF) MassBank (free) 18,000 MS2 (ESI, APCI) MassFrontier ($$) 7,000 MSn, ESI,(Spectral Tree Library ) RIKEN Respect (free) 9,000 MS2 (ESI) LipidBlast (CC-BY) 212,516 MS2 (in-silico computer generated) 119,200 compounds Important is data quality Annotation with InChI, InchiKey, structure and formula + metadata See: http://www.sisweb.com/software/ms/wiley.htm
Mass spectral databases II Smaller specialized libraries Pfleger Maurer Weber (Drugs) MS+RI, 70eV MassFinder (Volatiles) MS+RI, 70eV RIZA DB (Toxicants) MS+RI, 70eV Golm DB (primary Metabolites) MS+RI, 70eV Fiehnlib (primary Metabolites) MS+RI, 70eV AAFS (Drugs, Forensic,Toxicology), MS+RI, 70eV ChemicalSoft (Drugs), MS/MS, MSE _____________________________________________________________ In case of electron ionization (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1) and temperature program must be used for matching retention indices In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy
Mass spectral search • Library search is always the first step during the identification process. • Usually library search is not enough to assign unique isomer structures. • Mass spectra must be clean and background free before search. • For LC-MS and GC-MS this requires peak picking and deconvolution. • Additional orthogonal information has to be used: • restriction of compound space to certain species or material • use of isotope pattern information • use of retention index if derived from GC-MS data • use of retention – logp or logD correlations in case of LC-MS • additional fragmentation at different voltages (MSE) Only certain mass spectra can be in-silico predicted (calculated) (peptides, lipids, carbohydrates) – this is not the rule for other molecules
Mass spectral search algorithms PBM - Probability Based Matching (McLafferty & Stauffer) – since 1976 Dot Product (Finnigan/INCOS) – since 1978 Weighted Dot Product (Stein) – since 1993 Mass Spectral Tree Search (Mistrik)– since 21st century WeightedDot Product: Au and Ar: are the abundances of peaks in the user and reference mass spectra m: m/z values w: weighting term Source: Stein S.E. see notes
NIST MS Search GUI Search everything: A) Library Search: Reverse, Normal, Similarity, Neutral Loss B) Structure Similarity Search: find molecules similar to C) Formula Search: find C11H13N3O3S D) Constrained peak search: find peaks with m/z 122 and 188 and 266 E) Name search: find Stuntman (maleic hydrazide) Search Connections: Import/Export molecular structures: (msp, hpj, sdf) Interpret Structures (MSInterpreter.exe) Find substructures (expert algorithm) Import spectra from other programs (AMDIS, Chemstation, ChromaTOF) [Download] – freely available (NIST12 MS Library is licensed ~ $1200)
NISTMS mass spectral search The NIST MS Search program is the “gold standard” for EI spectral searchUsed for all types of unit resolution spectra MS/MS, APCI, ESI-MS spectra
NIST MS Search GUI and NIST12 DB 120,000 MS/MS spectra; 15,000 precursor ions (adducts); 7000 compounds MS2, MS3, MS4 data; up to 15 different ionization energies 12k iontrap, 9k QTOF/QQQ; 90% pos ionization, 10% neg ionization
MS/MS search for small molecules General concept low resolution Iontrap Triple-Quad Q-TOF 1) Precursor match MS / MS DB TOF-TOF 2) Product ion search MS/MS spectrum Results with annotation Orbitrap LipidBlast 200k tandem mass spectra high resolution FT-ICR-MS 1) Precursor match, searches ±0.4 Da (iontrap) to ± 0.005 Da windows (QTOFs) Powerful pre-filter, removes up to 99% of the wrong candidates 2) Product ion match (matches ions according to old-school similarity)
Searching 10,000 MS/MS spectra as batch MGF file from QTOF or iontrap Output folder MS/MS library DO NOT load into memory (Bug 2012) Search speed: 1500 spectra / second Output : EXCEL (tab separated file) Time demand setup: 30 seconds Run time per MFG: <5 seconds NIST MS PepSearch GUI
Excel output for 10k MS/MS spectra with metabolite name NIST MS PepSearch GUI MS/MS annotations of similar molecules require retention time confirmation
Mass Spectral Trees in Mass Frontier MassFrontiersearches MSn and CID mass spectra Source: MassFrontier Helpfile
Mass Frontier MS search MS Tree Hitlits
Reserpine-iontree3 # 45-45 RT: 0.48-0.48 AV: 2 NL: 5.64E5 T: ITMS + c ESI d Full ms3 609.95@cid35.00 448.20@cid35.00 [110.00-460.00] 195.00 100 90 80 70 60 MS3 Relative Abundance 50 40 236.06 30 20 10 416.17 204.01 144.08 167.05 384.15 248.21 286.15 332.24 430.21 0 150 200 250 300 350 400 450 m/z Reserpine-iontree3 # 215 RT: 2.75 AV: 1 NL: 3.15E4 T: ITMS + c ESI d Full ms4 609.95@cid35.00 448.20@cid35.00 236.06@cid35.00 [50.00-250.00] 204.10 100 90 80 70 60 Relative Abundance MS4 50 40 163.03 30 120.05 20 144.05 172.12 148.03 10 206.11 178.98 106.98 141.16 91.01 218.23 0 60 80 100 120 140 160 180 200 220 240 m/z Linear ion traps, Orbitraps, FT-CIRS easily can create MSn Ion Map – for all m/z values In mass range 100-650 Da one MS/MS spectrum Ion Tree – perform data dependent MS2,MS3,MS4 scans over whole mass range Comprehensive ion mapping and ion tree experiments using diverse compound sets will solve many fragmentation mysteries
Conversion of mass spectral libraries Usually a hassle. Keep a copy of libraries always in non-proprietary format. Request export functions or converters from your mass spec producer. XCalibur LibraryManager.exe Thermo Electron Fisher Finnigan MAT ICIS/GCQ/ITS 40 (*.lib, *.lbr) AutoMass (*.spr, *.prs, *.nam, *.hdr, *.fsf, *.cfs) MassLab (*.idb) to NIST and vice versa NIST LIB2NIST.exe [LINK] Spectral files *.msd, *.hpj, *.sdf HP LIB (*.LIB), NIST LIB, JCAMP-DX, (*.jdx *.hpj)
How to search molecules Exact search Substructure search Similarity search Ligand search R-group/Markush search
ChemAxon Instant-JChem desktop database Search structures, formulae, properties, exact masses, adducts
ChemAxon JCHEM for EXCEL Can be edited Property values are computed from structure Instant calculation and visualization using charts
NIST MS DB has structure similarity search Good for comparing mass spectra of similar compounds (may have similar mass spectra)
PubChem open molecule database 47,750,434 compounds 119,809,272 substances (salt forms, acids, modifications) 717,429 bioassays All structures and properties can be searched, downloaded, are linked Picture source: PubChem
Searching Molecules on PubChem 18 million compound DB (++) Goto PubChem Structure Search
Searching “everything” on ChemSpider Highly curated, literature, patents, properties, links to other DB
CAS SciFinder • 73 million molecules • 70 million commercially available products • largest reaction DB (53 million reactions) and literature DB • substructure and similarity search of structures • a must for chemists and biochemists/biologists • no bulk download, no good Import/ Export Download Scifinder
Structure search in SciFinder Retrieved 4000 papers (refine search only MS and MALDI)
PDF A A B B How scientist publish mass spectra (*) Today: Scientist A Runs MS Publication on paperas bitmap graphic Scientist B Needs DB OCR DB Curation DB Creation Sell DB Better: DB Central and Open Repository such as MassBank Electronic Publishing in XML Computerized Free or Paid Curation OCR – optical character recognition DB – database (*) – and structures and other spectral data
Open data repository for mass spectraand metabolomics data No loss of information (high resolution spectra) No truncated data (report five peaks only) No hamburger to cow algorithm needed (OCR) Fast and instant use with no restrictions New synergism for data interpretation Commercial use may be possible NIH funded the METABOLOMICS DATA CENTER to collect and share metabolomics data ($6M) DB Central and Open Repository … checkout MassBank … checkout MetaboLights
The Last Page - What is important to remember • There are different search types for mass spectral data • similarity search, reverse search, neutral loss search, MS/MS search • There are large libraries for electron ionization spectra (EI) from GC-MS • There are no large open/commercial libraries for spectra from LC-MS • For creation of mass spectral libraries a holistic approach is important • Mass spectral trees can give further information (MSE or MSn) • There are different types of searching structures • Exact search, similarity search, substructure search Before you start a research project, create target lists of possible candidates Collect mass spectra or structures in libraries with references