190 likes | 321 Views
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody. Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica. Applying the Fujisaki model to M andarin. 1. Phonetics Lab, Academia Sinica, Taiwan ( http://phslab.ling.sinica.edu.tw/ )
E N D
On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica
Applying the Fujisaki model to Mandarin • 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng • Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003) • 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose • Mandarin--manual extraction of Fujisaki parameters • Japanese—automatic extraction of Fujisaki parameter • 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William • Mandarin—manual extraction of Fujisaki parameters
Outline • Introduction--the Fujisaki model • Auto-extraction comparison– methods used at two labs to generate the Fujisaki parameters • Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 2004, 2005, 2006) • Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Narusawa 2002, 2003) • Manual extraction—Method used at CUHK to extract Fujisaki parameters • DSPand Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)
The Fujisaki Model (Fujisaki & Hirose 1984) log (F0)=base frequency+ phrase components +accent components phrase componentsaccent components superposed model + =
Original F0 contour Auto-extraction based on Mixdorff’s method (2000, 2003) highpass filter (stop frequency at 0.5 Hz) High-frequency contour (HFC) Low-frequency contour (LFC)
Decision of phrase commands Low-frequency contour (LFC) from Mixdorff’s method Position of local minimum optimization The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan Perceptual phrase boundary evaluation :
Phonetics Lab, Academia Sinica--Auto-extraction results of Mandarin ( Mixdorff 2003)
Hirose Lab— Auto extraction (Narusawa 2002, 2003) Original f0 contour Derivative-- target of phrase components Residual contour-- target of phrase components
Decision of phrase commands Dynamic Programming (DP) Residual contour The optimum I can be selected when c(I) is maximum.
Hirose Lab—Compensation from text analysis to aid auto-extraction Using parsed text to adjust extracted Fujisaki parameter
Hirose Lab—Auto-extraction of Japanese (Narusawa 2002, 2003) • Original method • An accent component should be located on a phrase component. • New method • Pause is considered. • Correction after using information from parsed text.
Auto-extraction of phrase components—Comparison of 2 labs • Phrase components • Phonetics Lab, IL, AS (modified Mixdorff 2003): Pre-extraction of phrase components--relatively close. • Hirose Lab: Pre-extraction-- not as close, but the final output can be compensated by text analysis. • Auto-extract acoustic signal f0 contour • Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)
Manual adjustment--Gu, CUHK • Note: 1. Insertion of phrase components is subjective. 2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)
Possible Future Considerations (1/2) • 1. Distinguishing acoustic feature is only pause? duration? Or f0? • 2. Or combination of acoustic features—pause, duration, and/or f0? • E.g. Test if duration can compensate F0 reset
Possible Future Considerations (2/2) Improvingauto-extraction of tone components • 3. The concept of tone nucleus • By retaining only the nucleus of syllable while ignoring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment) • By ignoring horizontal f0 variation (from Gu’s manual adjustment)
One major ambiguity among 3 labs—phrase component unit selection 1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase) 2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu) 3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected: PPh—adjusted from visual display PW—adjusted from perceptual decision
Why Prosodic Unit Selection can be a problem unique to Mandarin? Japanese: Bunsetsu--compound word consisting of two or more content words • Mandarin: • Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to maintain the tendency of one application of phrase component function. • 2. HKCU--Manual adjustment can be accurate but not systematic enough. • e.g. A phrase component sometimes corresponds to a prosodic phrase, • sometimes shorter.
Concluding Remarks • 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming. • 2. What possible improvement can auto-extraction borrow from manual adjustment? • Focusing on nucleus (syllable) • Understanding more of acoustic properties (F0, duration…) • 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. • Linguistic information—parsing (text analysis and syntax), semantics and pragmatics • Cognitive information---speech planning and processing