110 likes | 252 Views
Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR. ICASSP2009 Shilei Zhang, Qin Shi, Stephen M. Chu, and Yong Qin. Hsiao- Tsung Hung. Outline. Outlined features Contextual tone m odels CART Lattice rescoring using tone models Experiments.
E N D
Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR ICASSP2009 Shilei Zhang, Qin Shi, Stephen M. Chu, and Yong Qin Hsiao-Tsung Hung
Outline • Outlined features • Contextual tone models • CART • Lattice rescoring using tone models • Experiments
Outlined features • Second-order polynomial • F0 contour features :
Outlined features Which part of F0 contour carries tone information?
Contextual Tone Models • Features used for classification and tone modeling • duration and log-energy of the syllable • curve fitting parameters of the F0 contour of syllable • 2-equal length subsection pitch slopes • tone types of neighboring syllables • log-pitch mean of syllable and utterance • current location within utterance and phrase • speech rate
CART • Classification and regression tree • Steps of the algorithm in constructing CART tree: • at the root node, perform all possible splits on each of the predictor variables, and apply a predefined node measure to determine the reduction in entropy to each split. • select the best possible variable to split the node into two child nodes by applying the splitting criteria. • repeat steps 1) and 2) for each of the non terminal nodes and produces the largest possible tree. • apply pruning algorithm to the largest tree and produces a sequence of sub trees of different sizes from which an optimal tree is selected using k-fold cross validation.
Context-dependent classifier 5 unimodal Gaussian
CART • In order to minimize the syllable dependent variations, the likelihood normalization in log domain. • : 3-dimensional curve fitting vector • s :Syllable • :Tonal label • j:Terminal node • :The prior probabilities of tonein terminal node j • :3-variate Gaussian densities
Lattice rescoring using tone models • Word • Integrating the tone likelihood
Experiments • Connected Digits Experiments • Different speed • 2~5 digits
Experiments • Broadcast Speech Experiments • bn: broadcast news • bc: broadcast conversations