On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica

Applying the Fujisaki model to Mandarin • 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng • Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003) • 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose • Mandarin--manual extraction of Fujisaki parameters • Japanese—automatic extraction of Fujisaki parameter • 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William • Mandarin—manual extraction of Fujisaki parameters

Outline • Introduction--the Fujisaki model • Auto-extraction comparison– methods used at two labs to generate the Fujisaki parameters • Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 2004, 2005, 2006) • Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Narusawa 2002, 2003) • Manual extraction—Method used at CUHK to extract Fujisaki parameters • DSPand Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

The Fujisaki Model (Fujisaki & Hirose 1984) log (F0)=base frequency+ phrase components +accent components phrase componentsaccent components superposed model + =

Original F0 contour Auto-extraction based on Mixdorff’s method (2000, 2003) highpass filter (stop frequency at 0.5 Hz) High-frequency contour (HFC) Low-frequency contour (LFC)

Decision of phrase commands Low-frequency contour (LFC) from Mixdorff’s method Position of local minimum optimization The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan Perceptual phrase boundary evaluation :

Phonetics Lab, Academia Sinica--Auto-extraction results of Mandarin ( Mixdorff 2003)

Hirose Lab— Auto extraction (Narusawa 2002, 2003) Original f0 contour Derivative-- target of phrase components Residual contour-- target of phrase components

Decision of phrase commands Dynamic Programming (DP) Residual contour The optimum I can be selected when c(I) is maximum.

Hirose Lab—Compensation from text analysis to aid auto-extraction Using parsed text to adjust extracted Fujisaki parameter

Hirose Lab—Auto-extraction of Japanese (Narusawa 2002, 2003) • Original method • An accent component should be located on a phrase component. • New method • Pause is considered. • Correction after using information from parsed text.

Auto-extraction of phrase components—Comparison of 2 labs • Phrase components • Phonetics Lab, IL, AS (modified Mixdorff 2003): Pre-extraction of phrase components--relatively close. • Hirose Lab: Pre-extraction-- not as close, but the final output can be compensated by text analysis. • Auto-extract acoustic signal f0 contour • Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

Manual adjustment--Gu, CUHK • Note: 1. Insertion of phrase components is subjective. 2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

Manual adjustment--Gu, CUHK

Possible Future Considerations (1/2) • 1. Distinguishing acoustic feature is only pause? duration? Or f0? • 2. Or combination of acoustic features—pause, duration, and/or f0? • E.g. Test if duration can compensate F0 reset

Possible Future Considerations (2/2) Improvingauto-extraction of tone components • 3. The concept of tone nucleus • By retaining only the nucleus of syllable while ignoring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment) • By ignoring horizontal f0 variation (from Gu’s manual adjustment)

One major ambiguity among 3 labs—phrase component unit selection 1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase) 2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu) 3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected: PPh—adjusted from visual display PW—adjusted from perceptual decision

Why Prosodic Unit Selection can be a problem unique to Mandarin? Japanese: Bunsetsu--compound word consisting of two or more content words • Mandarin: • Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to maintain the tendency of one application of phrase component function. • 2. HKCU--Manual adjustment can be accurate but not systematic enough. • e.g. A phrase component sometimes corresponds to a prosodic phrase, • sometimes shorter.

Concluding Remarks • 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming. • 2. What possible improvement can auto-extraction borrow from manual adjustment? • Focusing on nucleus (syllable) • Understanding more of acoustic properties (F0, duration…) • 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. • Linguistic information—parsing (text analysis and syntax), semantics and pragmatics • Cognitive information---speech planning and processing

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

Presentation Transcript

Harnessing Speech Prosody for Human-Computer Interaction

Intro to Autism: Different Perspectives

Acquisition Perspectives on the Incremental Commitment Model

Different Perspectives

Healing the Spirit: Different Perspectives on Resolving Trauma

An Introduction to Mandarin Speech Recognition

Prosody Modeling (in Speech)

The Renaissance: Different Perspectives

Mandarin Chinese Speech Recognition

“Accessing the Dream” – different perspectives on knowledge transfer

Different Perspectives on Risk Perception

Different perspectives on design

Different Perspectives on Immigration

Towards Synthesis of Focus in Mandarin Text-to-speech System

Modeling Prosody for Language Identification on Read and Spontaneous Speech

ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION

Different Perspectives on Political Economy

Different perspectives

HTS-based Mandarin Text-to-Speech System