400 likes | 606 Views
Outline Introduction Related Work Method Results and Discussion. Information-Theoretic Mass Spectral Library Search. Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications. Information-Theoretic Mass Spectral Library Search.
E N D
Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Outline • Introduction • Mass spectrum search types • Related Work • Other techniques • NIST, PBM, DotMap • Method • Probability and Information • Normalized distribution function • Results • Conclusion Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Mass Spectrum Search Algorithm Search Types Applications Introduction – Mass Spectrum Decane Intensity m/z Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Mass Spectrum Search Algorithm Search Types Applications Introduction – Mass Spectrum Search Unknown Spectrum Search Algorithm Potential Matches MS Library Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Mass Spectrum Search Algorithm Search Types Applications Introduction – Search Types • Identity search • Unknown mass spectrum present in library • Looking for exact spectrum • Similarity search • Unknown mass spectrum not present in library • Looking for similar spectrum Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Mass Spectrum Search Algorithm Search Types Applications Introduction – MS Search Applications • Steroid detection in athletes • Monitor patient breath during surgery • Composition of molecular species found in space • Honey adulterated with corn syrup • Locate oil deposits • Monitor fermentation process in the biotechnology industry • Detect dioxins in contaminated fish Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – NIST MS-Search [Stein ‘94] • Pre-search the unknown spectra in library • Reduce search domain (160K 4K compounds) • Compute match factor for each compound in the pre-search result • Match Factor (MF) • Range 0-999 • Higher the better • Pre-search result sorted based on MF value • Pick the topmost compounds as possible matches Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – NIST MS-Search [Stein ‘94] • Match Factor Computation [Stein ‘94] • Term 1 – Mass weighted normalized dot product • Term 2 – Relative intensities of adjacent peaks in both spectra • Combination of F1 & F2 Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – NIST MS-Search [Stein ‘94] C-1 C-2 Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – Probability Based Matching [McLafferty et. al. ‘75] • Confidence Value (K) instead of MF • Four components for each m/z • Term 1 : U : Based on the uniqueness of a m/z value • Term 2 : A : Intensity contribution to the confidence • Term 3 : W : Window factor (measure of agreement) • Term 4 : D : Dilution factor (measure of purity) • K ∑ (U + A + W – D) for each m/z Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – DotMap [Sinovec et. al. ‘04] Fumaric acid Adipic acid DotMap Lactic acid Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – DotMap [Sinovec et. al. ‘04] • Inverse problem • DotMap computed across the image • Higher valued areas indicate presence of compound of interest • Multiple compounds of interest • Compute DotMap overlay Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Motivation • NIST MS-Search [Stein ‘94] • No domain information utilized • PBM Matching [McLafferty et. al. ‘75] • Old technique (‘75) • Ad hoc domain information utilization • DotMap • No domain information utilized Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Entropy • Entropy based approach • Entropy measure of the amount of uncertainty • Based on probabilities • Include domain based knowledge (information) in computing the match factor Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Distribution Function • Library • NIST EPA Library • 163K compounds • Compute distribution function (DF) • 2 dimensional array • m/z vs intensity • DF[i][j] • # compounds in library • m/z = i • Intensity = j Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Distribution Function Intensity m/z Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Normalized Distribution Function (NDF) • Normalized Distribution Function • NDF[mz][int] = DF[mz][int] / ∑ DF[mz][i] • Where ∑ DF[mz][i] = 163K • NDF Probabilities [0-1] i i Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Assumptions • Assumption Each m/z is treated independently in the match factor computation from normalized distribution function Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Motivation Probability & Entropy Distribution Function Match Factor Method – Match Factor Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Overview • Technique • Compound in library + Noise • Search noisy compound in library • Evaluation metric - Average Rank • Rank = Position of correct compound in hit list • Repeat above 3000 times and take average rank • Compared with • NIST • NISTDOT (First term in NIST algorithm) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Noise models • Additive AU = AL + G(0,σ) • Multiplicative AU = AL + AL* G(0,σ) • Johnson Colored AU = AL + G(0,σ*√m) • Random spectrum AU = AL + x * AR Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Additive Noise • Compound = Compound + Additive noise • Additive Gaussian noise • Zero mean • Variable standard deviation • For each m/z in library spectrum AU = AL + G(0,σ) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Additive Noise (Example) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Additive Noise (Performance) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Multiplicative Noise • Compound = Compound + Multiplicative noise • Multiplicative Gaussian noise • Zero mean • Variable standard deviation • For each m/z in library spectrum AU = AL + AL* G(0,σ) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Multiplicative Noise (Example) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Multiplicative Noise (Performance) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Johnson Colored Noise • Compound = Compound + Colored Noise • Gaussian noise • Zero mean • Variable standard deviation • For each m/z in library spectrum AU = AL + G(0,σ*√m) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Johnson Colored Noise (Example) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Johnson Colored Noise (Performance) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Random Spectrum Noise • Compound = Compound + Random Spectrum • Additive Spectrum • Add x% of another random spectrum • For each m/z in library or random spectrum • AU = AL + x * AR Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Random Spectrum Noise (Example) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Random Spectrum Noise (Performance) Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Summary of Noise Models • Additive AU = AL + G(0,σ) • Multiplicative AU = AL + AL* G(0,σ) • Johnson Colored AU = AL + G(0,σ*√m) • Random Spectrum AU = AL + x * AR Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Summary of Noise Models Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Results – Summary of Noise Models Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Conclusion • MS library search algorithm • Information theoretic • Domain knowledge incorporated • Algorithm works well for various noise models • Future work • Must improve performance for the random spectrum noise case Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar
Outline Introduction Related Work Method Results and Discussion Questions & Suggestions ? Information-Theoretic Mass Spectral Library Search CSCE 990 – GCxGC Seminar