590 likes | 784 Views
A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data. 吳立青. Outline. Introduction Methods Data source Methods of comparison Results Conclusion. Introduction.
E N D
A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青
Outline • Introduction • Methods • Data source • Methods of comparison • Results • Conclusion
Introduction • Using protein mass spectrometry to discriminate diseased from healthy individuals becomes more popular • MALDI-TOF and SELDI-TOF have the advantage : • Fast • High through-put • Accuracy • Protein ID
The example of ovarian cancer data • Large scale • Full of noise • Nonlinear and non-stationary
The analysis of mass spectra seems to be simple.However, we do suffer from several problems ,and they need to be solved. Motivation
Common preprocessing step • Baseline subtraction • Denoising : very important and also complicated ! • Normalization • Peak detection • Peak alignment
Noise component • Chemical noise : • From the matrix material and sample contaminations • One kind of the biochemical material • Electrical noise : The physical characteristics of the machine • Do not mean anything actually
Chemical noise • Chemical component (organic acid) • We call it Matrix • Ionization • Provide H+ to peptide or protein for ionization and flight in the machine • Protection • Protect the peptide or protein in the process of laser flash
Problem • The simulated model before did not separate the chemical noise from spectra. • The chemical noise mixed with the machine noise is worse. • A novel preprocessing method should be developed
Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29.
Goal • Develop a method which can be satisfied • The electrical noise should be removed • Preserve the significant peaks even the chemical noise
Methods • Hilbert Huang transform • Denoising • Modification • Baseline subtraction • Rescale • Peak detection
Hilbert Huang transform • Method : Hilbert Huang transform (HHT) • Wu, Z., N. E. Huang, et al. (2007). "On the trend, detrending, and variability of nonlinear and nonstationary time series." Proc NatlAcadSci U S A104(38): 14889-94. • An adaptive data analysis method for nonlinear and non-stationary processes • The main feature of HHT is the empirical mode decomposition (EMD) • After the process of EMD, we get the intrinsic mode functions and remove several from them as noise • Goal : denoising
Process of EMD (1/5) • Find the envelope of the local maxima
Process of EMD (2/5) • Find the envelope of the local minima
Process of EMD (3/5) • Compute the mean envelope from the maximum envelope and minimum envelope
Process of EMD (4/5) • We get IMF1 (i.e. h1) by subtracting the mean envelope m1 from the original signal X(t)
Process of EMD (5/5) • We take IMF1 as X(t) and repeat the same process and so on. • We terminate the process untill the number of the extrema and the zero-crossing of IMFndiffer by more than one
IMFs : 1~16 HHT
Modification • Baseline subtraction • Remove the systematic artifacts • Rescale • Shift the scale to positive • Peak detection • Key feature of the preprocessing method • We compare several popular methods
Peak detection • We use three popular algorithms for peak detection • MassSpecWavelet (Du, Kibbe et al. 2006) • SpecAlign (Wong, Cagney et al. 2005) • PROcess (Li 2005)
Data source • Source : National Cancer Institute • Type : 50 ovarian cancer data • Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 • Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29
Methods of comparison • Judgment • Count of peaks detected • Real location of the peaks in visual • Interior comparison • HHT and modification+SpecAlign • HHT and modification+PROcess • HHT and modification+MassSpecWavelet
Methods of comparison • Exterior comparison • SpecAlign • Abbreviation : SA • PROcess • Interpolation : PRO1 • Regression : PRO2 • MassSpecWavelet • Abbreviation : MSW • PRO2+MSW • As suggested in Cruz-Marcelo, Guerra et al. 2008
Results of interior comparison • Results of interior comparison
3 Modified HHT+MSW Peak detected : 218 M over z range : whole region 13 安追
3 Modified HHT+SpecAlign Peak detected : 80 M over z range : whole region 13 安追
3 Modified HHT+PROcess Peak detected : 108 M over z range : whole region 13 Significant peak lost 安追
Results of exterior comparison • Results of exterior comparison
3 PRO1 Peak detected : 145 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追
3 PRO2 Peak detected : 114 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追
3 MSW Peak detected : 188 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追
3 MSW Peak detected : 25 M over z range : 6000~8000 13 安追
3 PRO2+MSW Peak detected : 198 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追
Ex 3 PRO2+MSW Peak detected : 33 M over z range : 6000~8000 13 安追
Results • Interior comparison: • HHT and modification+MSW covers the most of the peaks • HHT and modification+SpecAlign pick the most important peaks • Exterior comparison: • PROcess miss the significant peaks • MassSpecWavelet and PRO2MSW have many redundancies
Results of validation • Validation • Data source : Cathay General Hospital • Experiments : • Divide into three experiments • Water only • VrD1
Water Sample : water Organic acid : CHCA (<1000 DA)
VrD1 Sample : VrD1 Type : protein Organic acid : CHCA (<1000 Da) Molecular weight : 5119 Da
Results of validation Number of the peaks detected
The peaks of VrD1 detected by MassSpecWacelet Molecular weight : 5119 Da
The peaks of VrD1 detected by SpecAlign Molecular weight : 5119 Da