1 / 57

吳立青

A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data. 吳立青. Outline. Introduction Methods Data source Methods of comparison Results Conclusion. Introduction.

usoa
Download Presentation

吳立青

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A novel preprocessing method using Hilbert Huang transform for MALDI-TOF and SELDI-TOF mass spectrometry data 吳立青

  2. Outline • Introduction • Methods • Data source • Methods of comparison • Results • Conclusion

  3. Introduction • Using protein mass spectrometry to discriminate diseased from healthy individuals becomes more popular • MALDI-TOF and SELDI-TOF have the advantage : • Fast • High through-put • Accuracy • Protein ID

  4. SELDI-TOF MS applications in clinical oncology

  5. The example of ovarian cancer data • Large scale • Full of noise • Nonlinear and non-stationary

  6. The analysis of mass spectra seems to be simple.However, we do suffer from several problems ,and they need to be solved. Motivation

  7. Common preprocessing step • Baseline subtraction • Denoising : very important and also complicated ! • Normalization • Peak detection • Peak alignment

  8. Noise component • Chemical noise : • From the matrix material and sample contaminations • One kind of the biochemical material • Electrical noise : The physical characteristics of the machine • Do not mean anything actually

  9. Chemical noise • Chemical component (organic acid) • We call it Matrix • Ionization • Provide H+ to peptide or protein for ionization and flight in the machine • Protection • Protect the peptide or protein in the process of laser flash

  10. Problem • The simulated model before did not separate the chemical noise from spectra. • The chemical noise mixed with the machine noise is worse. • A novel preprocessing method should be developed

  11. Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29.

  12. Goal • Develop a method which can be satisfied • The electrical noise should be removed • Preserve the significant peaks even the chemical noise

  13. Methods • Hilbert Huang transform • Denoising • Modification • Baseline subtraction • Rescale • Peak detection

  14. Flow chart

  15. Hilbert Huang transform • Method : Hilbert Huang transform (HHT) • Wu, Z., N. E. Huang, et al. (2007). "On the trend, detrending, and variability of nonlinear and nonstationary time series." Proc NatlAcadSci U S A104(38): 14889-94. • An adaptive data analysis method for nonlinear and non-stationary processes • The main feature of HHT is the empirical mode decomposition (EMD) • After the process of EMD, we get the intrinsic mode functions and remove several from them as noise • Goal : denoising

  16. Process of EMD (1/5) • Find the envelope of the local maxima

  17. Process of EMD (2/5) • Find the envelope of the local minima

  18. Process of EMD (3/5) • Compute the mean envelope from the maximum envelope and minimum envelope

  19. Process of EMD (4/5) • We get IMF1 (i.e. h1) by subtracting the mean envelope m1 from the original signal X(t)

  20. Process of EMD (5/5) • We take IMF1 as X(t) and repeat the same process and so on. • We terminate the process untill the number of the extrema and the zero-crossing of IMFndiffer by more than one

  21. IMFs : 1~16 HHT

  22. Modification • Baseline subtraction • Remove the systematic artifacts • Rescale • Shift the scale to positive • Peak detection • Key feature of the preprocessing method • We compare several popular methods

  23. Baseline subtraction

  24. Baseline subtraction

  25. Peak detection • We use three popular algorithms for peak detection • MassSpecWavelet (Du, Kibbe et al. 2006) • SpecAlign (Wong, Cagney et al. 2005) • PROcess (Li 2005)

  26. Data source • Source : National Cancer Institute • Type : 50 ovarian cancer data • Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 • Kwon, D., M. Vannucci, et al. (2008). "A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise." Proteomics 8(15): 3019-29

  27. Methods of comparison • Judgment • Count of peaks detected • Real location of the peaks in visual • Interior comparison • HHT and modification+SpecAlign • HHT and modification+PROcess • HHT and modification+MassSpecWavelet

  28. Methods of comparison • Exterior comparison • SpecAlign • Abbreviation : SA • PROcess • Interpolation : PRO1 • Regression : PRO2 • MassSpecWavelet • Abbreviation : MSW • PRO2+MSW • As suggested in Cruz-Marcelo, Guerra et al. 2008

  29. Raw data

  30. Results after HHT and modification

  31. Results of interior comparison • Results of interior comparison

  32. 3 Modified HHT+MSW Peak detected : 218 M over z range : whole region 13 安追

  33. 3 Modified HHT+SpecAlign Peak detected : 80 M over z range : whole region 13 安追

  34. 3 Modified HHT+PROcess Peak detected : 108 M over z range : whole region 13 Significant peak lost 安追

  35. Results of exterior comparison • Results of exterior comparison

  36. 3 PRO1 Peak detected : 145 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追

  37. 3 PRO2 Peak detected : 114 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追

  38. 3 MSW Peak detected : 188 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追

  39. 3 MSW Peak detected : 25 M over z range : 6000~8000 13 安追

  40. 3 PRO2+MSW Peak detected : 198 M over z range : whole region 13 Meuleman, W., J. Y. Engwegen, et al. (2008). "Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data." BMC Bioinformatics 9: 88 安追

  41. Ex 3 PRO2+MSW Peak detected : 33 M over z range : 6000~8000 13 安追

  42. Results • Interior comparison: • HHT and modification+MSW covers the most of the peaks • HHT and modification+SpecAlign pick the most important peaks • Exterior comparison: • PROcess miss the significant peaks • MassSpecWavelet and PRO2MSW have many redundancies

  43. Results of validation • Validation • Data source : Cathay General Hospital • Experiments : • Divide into three experiments • Water only • VrD1

  44. Water Sample : water Organic acid : CHCA (<1000 DA)

  45. VrD1 Sample : VrD1 Type : protein Organic acid : CHCA (<1000 Da) Molecular weight : 5119 Da

  46. Results of validation Number of the peaks detected

  47. The peaks of Water detected by MassSpecWacelet

  48. The peaks of VrD1 detected by MassSpecWacelet Molecular weight : 5119 Da

  49. The peaks of Water detected by SpecAlign

  50. The peaks of VrD1 detected by SpecAlign Molecular weight : 5119 Da

More Related