1 / 11

Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR

Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR. ICASSP2009 Shilei Zhang, Qin Shi, Stephen M. Chu, and Yong Qin. Hsiao- Tsung Hung. Outline. Outlined features Contextual tone m odels CART Lattice rescoring using tone models Experiments.

chacha
Download Presentation

Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Main Vowel Domain Tone Modeling With Lexical and Prosodic Analysis for Mandarin ASR ICASSP2009 Shilei Zhang, Qin Shi, Stephen M. Chu, and Yong Qin Hsiao-Tsung Hung

  2. Outline • Outlined features • Contextual tone models • CART • Lattice rescoring using tone models • Experiments

  3. Outlined features • Second-order polynomial • F0 contour features :

  4. Outlined features Which part of F0 contour carries tone information?

  5. Contextual Tone Models • Features used for classification and tone modeling • duration and log-energy of the syllable • curve fitting parameters of the F0 contour of syllable • 2-equal length subsection pitch slopes • tone types of neighboring syllables • log-pitch mean of syllable and utterance • current location within utterance and phrase • speech rate

  6. CART • Classification and regression tree • Steps of the algorithm in constructing CART tree: • at the root node, perform all possible splits on each of the predictor variables, and apply a predefined node measure to determine the reduction in entropy to each split. • select the best possible variable to split the node into two child nodes by applying the splitting criteria. • repeat steps 1) and 2) for each of the non terminal nodes and produces the largest possible tree. • apply pruning algorithm to the largest tree and produces a sequence of sub trees of different sizes from which an optimal tree is selected using k-fold cross validation.

  7. Context-dependent classifier 5 unimodal Gaussian

  8. CART • In order to minimize the syllable dependent variations, the likelihood normalization in log domain. • : 3-dimensional curve fitting vector • s :Syllable • :Tonal label • j:Terminal node • :The prior probabilities of tonein terminal node j • :3-variate Gaussian densities

  9. Lattice rescoring using tone models • Word • Integrating the tone likelihood

  10. Experiments • Connected Digits Experiments • Different speed • 2~5 digits

  11. Experiments • Broadcast Speech Experiments • bn: broadcast news • bc: broadcast conversations

More Related