310 likes | 471 Views
Variable Penalty Dynamic Time Warping For Aligning Chromatography Data. David Clifford Research Scientist June 2009. Talk Outline. Gas Chromatography Mass Spectrometry Examples and Properties Dynamic time warping – origins in speech recognition
E N D
Variable Penalty Dynamic Time WarpingFor Aligning Chromatography Data David Clifford Research Scientist June 2009
Talk Outline • Gas Chromatography Mass Spectrometry • Examples and Properties • Dynamic time warping – origins in speech recognition • Uses in the 21st century aligning GC-MS data • Central Idea of the talk – variable penalty DTW, joint work with Glenn Stone • Results of alignment and How to do it CSIRO Issues in aligning multiple - MS spectra
Gas Chromatography • Separates a gas into its constituent parts • These elute from machine over period of 40 minutes • Measures quantity several times a second • Does not identify compounds • Gold standard in analytical chemistry • Slow process, expensive technology CSIRO Issues in aligning multiple - MS spectra
Uses of Gas Chromatography • Wine Chemistry • Meat quality • Metabolomic studies • Data format is similar to Liquid Chromatography-MS etc CSIRO Issues in aligning multiple - MS spectra
Goal of this talk • How can we align the two signals • How can we align many signals • Dynamic time warping – yes but it overdoes the warping • Variable penalty DTW – balances warping with alignment needs • VPdtw package now available on CRAN CSIRO Issues in aligning multiple - MS spectra
Before and After Alignment CSIRO Issues in aligning multiple - MS spectra
Calling for a taxi…. • Matches what you say with database of placenames • Dynamic time warping was invented in the late 60s early 70s to do this kind of matching. • DTW can expand or contract your words to match placenames • DTW is natural choice for matching speech • Speed of speech differs between individuals • Um’s and ah’s need to be cut out etc. • DTW is a very fast algorithm, achieves global optimum CSIRO Issues in aligning multiple - MS spectra
No alignment CSIRO Issues in aligning multiple - MS spectra
Alignment by Shift CSIRO Issues in aligning multiple - MS spectra
Linear Transformation (Shift and Stretch) CSIRO Issues in aligning multiple - MS spectra
Parametric Time Warping CSIRO Issues in aligning multiple - MS spectra
Asymmetric Dynamic Time Warping CSIRO Issues in aligning multiple - MS spectra
Sakoe-Chiba DTW (bound on shift) Memory efficient variation of DTW – faster method CSIRO Issues in aligning multiple - MS spectra
Dynamic Time Warping Guaranteed global optimum, but lots of non-diagonal moves CSIRO Issues in aligning multiple - MS spectra
Paths found with two different penalties CSIRO Issues in aligning multiple - MS spectra
Why do we need to care about this Analysis is based on peak area – and overwarping will affect peak shape and area. Overwarping introduces artificial features into data. Overwarping occurs due to too many non-diagonal moves Solution #1: penalise non-diagonal moves Solution #2: variable penalty dependent on size of peaks CSIRO Issues in aligning multiple - MS spectra
Variable penalty DTW • Minimise over paths w • Choose penalty vector using a dilation of the signals • Large penalty with large peaks • Minimise this function using dynamic programming • Easy to implement • How does it compare to DTW, constant penalty DTW, and parametric time warping? CSIRO Issues in aligning multiple - MS spectra
Key Ingredient for VPdtw • Penalty vector – proportional to a dilation of the signal. • There is some subjectivity here to balance the need for alignment with the affect on raw signals. CSIRO Issues in aligning multiple - MS spectra
Before Alignment – can’t see detail but CSIRO Issues in aligning multiple - MS spectra
Check Alignment #1 CSIRO Issues in aligning multiple - MS spectra
Check Alignment #2 CSIRO Issues in aligning multiple - MS spectra
Check Alignment #3 CSIRO Issues in aligning multiple - MS spectra
How far are points moved by alignment? CSIRO Issues in aligning multiple - MS spectra
VPdtw package – now on CRAN, GPL 2 • VPdtw, dilation, plot.VPdtw, print.VPdtw • result <- VPdtw(reference, query, penalty, maxshift = 350) • print(result) • plot(result,”Before”) • plot(result,”After”) • plot(result,”Shifts”) • plot(result) • Many queries, one penalty • One query, many penalties • Reference can be NULL CSIRO Issues in aligning multiple - MS spectra
Comparisons – Time CSIRO Issues in aligning multiple - MS spectra
Summary • Introduced GC-MS data • This talk is really about improving data quality • Improvement via alignment • without data reduction • without unnatural features • via fast computation • VPdtw available on CRAN • Faster • Better than available alternatives CSIRO Issues in aligning multiple - MS spectra
References DTW: Vintsyuk, T. K. Kibernetika1968 4 81 - 88 Sakoe, H., and Chiba, S. Proceedings of the International Congress on Acoustics, Budapest, Hungary, 1971; paper 20 c 13. Parametric Time Warping: Eilers, P.H.C. Anal. Chem.2004 76 404 - 411 Alignment Using Variable Penalty Dynamic Time Warping by Clifford, Stone, Montoliu, Rezzi, Martin, Guy, Bruce and Kochhar.Anal. Chem., 2009, 81 (3), pp 1000–1007 CSIRO Issues in aligning multiple - MS spectra
Statistical Bioinformatics - Agribusiness David Clifford Research Scientist CSIRO Division of Mathematics, Informatics and Statistics Phone: +61 2 9325 3210 Email: David.Clifford@csiro.au Web: www.csiro.au/science/org/CMIS.html Thank you Contact UsPhone: 1300 363 400 or +61 3 9545 2176Email: Enquiries@csiro.au Web: www.csiro.au
VPdtw package – plot(result,”Before”) CSIRO Issues in aligning multiple - MS spectra
VPdtw package – plot(result,”After”) CSIRO Issues in aligning multiple - MS spectra
VPdtw package – print(result) Reference is NULL. Query column # 13 is chosen at random. Query matrix is made up of 16 samples of length 5000. Single Penalty vector supplied by user. Max allowed shift is 150. Cost Overlap Max Obs Shift # Diag Moves # Expanded # Dropped Query #1: 1521.10 4994 51 4996 47 2 Query #2: 1708.30 4996 53 5000 49 0 Query #3: 1479.60 4998 59 5000 57 0 Query #4: 1302.30 4998 62 5000 60 0 Query #5: 1505.40 4996 61 5000 57 0 Query #6: 1296.80 4997 60 5000 57 0 Query #7: 1420.80 5000 61 5000 62 0 Query #8: 1484.20 5000 59 5000 60 0 Query #9: 1424.30 5000 51 5000 53 0 Query #10:1306.30 4997 42 5000 39 0 Query #11:1193.30 4994 29 4990 28 5 Query #12: 225.04 4999 13 4998 13 1 Query #13: 0.00 5000 0 5000 0 0 Query #14: 266.09 4944 56 4894 2 53 Query #15: 746.93 4937 63 4880 4 60 Query #16: 345.87 4914 86 4836 0 82 CSIRO Issues in aligning multiple - MS spectra