130 likes | 353 Views
Dynamic Time Warping Applications and Derivation. Charles Tappert Seidenberg School of CSIS, Pace University. Dynamic Time Warping (DTW) non-linear/elastic matching, Viterbi algorithm. Many Applications Speech recognition Speech sound alignment Speech sound generation
E N D
Dynamic Time WarpingApplications and Derivation Charles Tappert Seidenberg School of CSIS, Pace University
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Many Applications • Speech recognition • Speech sound alignment • Speech sound generation • Online handwriting recognition
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Derivation of a DTW algorithm variation (speech recognition) • A speech utterance is represented as a time sequence of feature vectors • Example
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Consider a finite state machine model of a speech utterance prototype where the observable output from transitions between states is an acoustic feature vector which is a probabilistic function of the origin state of each transition • Note: some transitions cause stretching and others cause compression of the sequence of feature vectors produced
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Background information • Univariate (one-dimensional) normal density function
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Background information • Multivariate normal density function
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • In traversing each arc in this model a feature vector is produced with assumed underlying normal distribution • where i is the unknown, j the prototype, Vi are feature vectors of the unknown, Mj are mean feature vectors and sigma the covariance matrix of the prototype • This statistical characterization of prototypes would require multiple repetitions of the vocabulary to be recognized
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • To find the optimal overall probability of the model (prototype) generating the candidate, we estimate the maximum value of the cumulative probability over the possible paths through the model • Assuming statistical independence of the feature vectors, the best path to any point (i, j) and probability P(i, j) can be computed, starting with P(0, 0) = Prob(0, 0) and P(i, j) = 0 elsewhere, using the recursion relation
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Taking the log of terms in previous equation, dropping constant terms, multiplying by -2, and assuming zero covariance terms yields the recursion relation • where D(i, j) is considered a cumulative distance measure
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • Further, assume equal variances and transition probabilities, and include an index k indicating the prototype • d(i, j; k) is the distance between feature vectors i and j • Note: since the log function is a monotonically increasing function of its argument and changing sign converts a maximizing relation into a minimizing one, this distance relation leads to the same decisions as the probability recursion relation, except for the simplifying assumptions
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • This derivation shows the simplifying assumptions made in going from a probabilistic model to a greatly simplified distance model • Currently, most string matching commercial and research systems use probabilistic models, and the Hidden Markov Model (HMM) is probably the dominant one • As computing power has increased over the years, more complex and primarily probabilistic models requiring large training corpuses have been used
Dynamic Time Warping (DTW)non-linear/elastic matching, Viterbi algorithm • In his research at IBM’s T.J. Watson Research Center, your instructor worked in both the speech recognition and the pen computing/handwriting recognition groups • founding member of the speech group (once over 50 workers) • spearheaded development of ThinkWrite handwriting recognizer in IBM’s pen-enabled ThinkPad product in the early 1990s • The data in both the speech and online handwriting problems are time sequences • Speech is recorded as a time waveform and usually transformed via frequency analysis into a sequence of spectral time samples • Online handwriting is captured as a time sequence of x-y coordinates describing the trajectory of the handwriting