210 likes | 224 Views
Explore speech parameter generation using dynamic features and algorithms for HMM-based synthesis, including multi-space probability distribution and simultaneous modeling of spectrum, pitch, and duration.
E N D
Basics of HMM-based speech synthesis • Contents • Speech Parameter Generation Using Dynamic Features • Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis • Multi-Space Probability Distribution HMM • Simultaneous Modelling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation using Dynamic Features (1) • 1. Problem vector sequence of speech parameter state sequence of an HMM λ vector of speech parameter at time t ct … static feature vector (e.g. cepstral coefficient) Δct … dynamic feature vector (e.g. delta cepstral coefficient) • Determine parameter sequence c that maximizes Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation using Dynamic Features (2) • 2. Solution to the Problem • For a given q, maximizing ,with respect to c is equivalent to maximizing , with respect to c, since does not depend on O. • … single mixture Gaussian distribution • By setting to maximize we obtain a set of equations • with Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation using Dynamic Features (3) • Solution to the Problem … mean vector of ct … covariance matrix of ct … mean vector of Δct … covariance matrix of Δ ct • To obtain optimal q and c the set of equations has to be solved for every possible state sequence Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation using Dynamic Features (3) • Solution to the Problem Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation using Dynamic Features (4) • Example Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [2] [1], [2]
Speech Parameter Generation Algorithms (1) • Previously: state sequence was assumed to be known • Now: (part of) state sequence is hidden • Goal: determine speech parameter vector sequence • so that is • maximized with respect to O, • where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q • … speech parameter vector Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation Algorithms (2) • Derived Algorithm • Based on EM-Algorithm • Find critical point of the likelihood function P(O| λ) … auxiliary function of current and new parameter vector sequence O and O’, respectively Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration … occupancy probability
Speech Parameter Generation Algorithms (3) • Derived Algorithm Under condition O’=WC’,C’ which maximizes Q(O,O’) is given by Solved with recursive algorithm for dealing with dynamic features Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Speech Parameter Generation Algorithms (4) • Example • 5-state left-to-right HMMs • Speech sampled at 16 kHz & windowed by 25.6ms Blackman window with 5ms shift • Mel-cepstral coefficients obtained by mel-cepstral analysis • Sentence fragment “kiNzokuhiroo” Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [3]
Multi-Space Probability Distribution HMM (1) • Problem:We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4] Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration Observation o1 consists of a set of space indices X1{1,2,G} and a 3D vector x1 R³ x1 is drawn from one of three spaces Ω1, Ω2, Ω3 R³ Pdf for x1 : w1N1(x)+w2N2(x)+wGNG(x) [4]
Sample space Ω Multi-Space Probability Distribution HMM (2) • Example: A man fishing in a pond Ω1:2D space for length and height of red fish Ω2: 2D space for length and height of blue fish Ω3: 1D space for diameters of tortoises Ω4: 0D space for articles of junk Weights w1 to w4 are determined by ratio of blue and red fish, tortoises and junk in the pond N1, N2: 2D pdfs for sizes and heights of fish N3: 1D pdf for diameter of tortoise Man catches red fish: observation o=({1},x) is made Same by night: o=({1,2},x) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration Ω1=R² Ω2=R² Ω3=R¹ Ω4=Rº 4
Multi-Space Probability Distribution HMM (3) • Algorithm • Each state i has G pdfs NiG and their • weights wiG • The observation probability of O, P(O| λ) • is calculated with the forward-backward • algorithm • Then, we need to maximize the • observation likelihood of P(O| λ) over all parameters • reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [4]
Simultaneous modelling of Spectrum, Pitch and Duration (1) • Simultaneous Modelling • Pitch patterns modelled by MSD-HMM • observation sequence composed of continuous values and discrete symbols • Spectrum modelled by continuous probability distribution • State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]
Simultaneous modelling of Spectrum, Pitch and Duration (2) • Context Dependent Model • Many contextual factors taken into account such as: • Position of breath group in sentence • Position of current phoneme in current accentual phrase • {preceding, current succeeding} part of speech • {preceding, current succeeding} phoneme • (etc.) • Decision-tree based context clustering applied because • As contextual factors increase, their combinations increase exponentially model parameters cannot be estimated with sufficient accuracy by limited training data • Impossible to prepare speech database which includes all combinations of contextual factors Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration
Simultaneous modelling of Spectrum, Pitch and Duration (3) • Context Dependent Model spectrum, pitch and duration have their own influential contextual factors distributions of the respective parameters are clustered independently Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]
Simultaneous modelling of Spectrum, Pitch and Duration (4) • arbitrary text converted to context-based label sequence • sentence HMM constructed according to label sequence • State durations determined so as to maximize likelihood of state duration densities • Sequence of mel-cepstrum coefficients and pitch values generated • Speech synthesized directly by MLSA filter 3. Text-to-Speech Synthesis System Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]
References • [1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995 • [2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’ tokuda_TTSworkshop2002.pdf • [3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM-based Speech Synthesis, ICASSP, 2000 • [4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000
References • [5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU-ROSPEECH, 1999