190 likes | 291 Views
Context in Multilingual Tone and Pitch Accent Recognition. Gina-Anne Levow University of Chicago September 7, 2005. Roadmap. Motivating Context Data Collections & Processing Modeling Context for Tone and Pitch Accent Context in Recognition Conclusion. Challenges .
E N D
Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005
Roadmap • Motivating Context • Data Collections & Processing • Modeling Context for Tone and Pitch Accent • Context in Recognition • Conclusion
Challenges • Tone and Pitch Accent Recognition • Key component of language understanding • Lexical tone carries word meaning • Pitch accent carries semantic, pragmatic, discourse meaning • Non-canonical form (Shen 90, Shih 00, Xu 01) • Tonal coarticulation modifies surface realization • In extreme cases, fall becomes rise • Tone is relative • To speaker range • High for male may be low for female • To phrase range, other tones • E.g. downstep
Strategy • Common model across languages, SVM classifier • Acoustic-prosodic model: no word label, POS, lexical stress info • No explicit tone label sequence model • English, Mandarin Chinese (also Cantonese) • Exploit contextual information • Features from adjacent syllables • Height, shape: direct, relative • Compensate for phrase contour • Analyze impact of • Context position, context encoding, context type • > 20% relative improvement over no context • Preceding context greater enhancement than following
Data Collection & Processing • English: (Ostendorf et al, 95) • Boston University Radio News Corpus, f2b • Manually ToBI annotated, aligned, syllabified • Pitch accent aligned to syllables • Unaccented, High, Downstepped High, Low • (Sun 02, Ross & Ostendorf 95) • Mandarin: • TDT2 Voice of America Mandarin Broadcast News • Automatically force aligned to anchor scripts (CUSonic) • High, Mid-rising, Low, High falling, Neutral
Local Feature Extraction • Uniform representation for tone, pitch accent • Motivated by Pitch Target Approximation Model • Tone/pitch accent target exponentially approached • Linear target: height, slope (Xu et al, 99) • Scalar features: • Pitch, Intensity max, mean (Praat, speaker normalized) • Pitch at 5 points across voiced region • Duration • Initial, final in phrase • Slope: • Linear fit to last half of pitch contour
Context Features • Local context: • Extended features • Pitch max, mean, adjacent points of preceding, following syllables • Difference features • Difference between • Pitch max, mean, mid, slope • Intensity max, mean • Of preceding, following and current syllable • Phrasal context: • Compute collection average phrase slope • Compute scalar pitch values, adjusted for slope
Classification Experiments • Classifier: Support Vector Machine • Linear kernel • Multiclass formulation • (SVMlight, Joachims), LibSVM (Cheng & Lin 01) • 4:1 training / test splits • Experiments: Effects of • Context position: preceding, following, none, both • Context encoding: Extended/Difference • Context type: local, phrasal
Discussion: Local Context • Any context information improves over none • Preceding context information consistently improves over none or following context information • English: Generally more context features are better • Mandarin: Following context can degrade • Little difference in encoding (Extend vs Diffs) • Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory
Results & Discussion: Phrasal Context • Phrase contour compensation enhances recognition • Simple strategy • Use of non-linear slope compensate may improve
Conclusion • Employ common acoustic representation • Tone (Mandarin), pitch accent (English) • Cantonese, recent experiments • SVM classifiers - linear kernel: 76%, 81% • Local context effects: • Up to > 20% relative reduction in error • Preceding context greatest contribution • Carryover vs anticipatory • Phrasal context effects: • Compensation for phrasal contour improves recognition
Current & Future Work • Application of model to different languages • Cantonese, Dschang (Bantu family) • Cantonese: ~65% acoustic only, 85% w/segmental • Integration of additional contextual influence • Topic, turn, discourse structure • HMSVM, GHMM models • http://people.cs.uchicago.edu/~levow/projects/tai • Supported by NSF Grant #: 0414919
Related Work • Tonal coarticulation: • Xu & Sun,02; Xu 97;Shih & Kochanski 00 • English pitch accent • X. Sun, 02; Hasegawa-Johnson et al, 04; Ross & Ostendorf 95 • Lexical tone recognition • SVM recognition of Thai tone: Thubthong 01 • Context-dependent tone models • Wang & Seneff 00, Zhou et al 04
Pitch Target Approximation Model • Pitch target: • Linear model: • Exponentially approximated: • In practice, assume target well-approximated by mid-point (Sun, 02)