1 / 19

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody. Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica. Applying the Fujisaki model to M andarin. 1. Phonetics Lab, Academia Sinica, Taiwan ( http://phslab.ling.sinica.edu.tw/ )

aaralyn
Download Presentation

On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Different Perspectives of Utilizing the Fujisaki Model to Mandarin Speech Prosody Zhao-yu Su Phonetics Lab, Institute of Linguistics, Academia Sinica

  2. Applying the Fujisaki model to Mandarin • 1. Phonetics Lab, Academia Sinica, Taiwan (http://phslab.ling.sinica.edu.tw/) PI: Prof. Chiu-yu Tseng • Mandarin– automatic extraction of Fujisaki parameters (Mixdorff, 2003) • 2. Hirose Lab, Tokyo University, Japan (http://www.gavo.t.u-tokyo.ac.jp/) PI: Pro. Keikichi Hirose • Mandarin--manual extraction of Fujisaki parameters • Japanese—automatic extraction of Fujisaki parameter • 3. DSP and Speech Technology Lab , CUHK, Hong kong (http://dsp.ee.cuhk.edu.hk/) PI: Prof. CHING Pak-Chung Prof. LEE Tan Prof. WANG Shi-Yuan, William • Mandarin—manual extraction of Fujisaki parameters

  3. Outline • Introduction--the Fujisaki model • Auto-extraction comparison– methods used at two labs to generate the Fujisaki parameters • Phonetics Lab, Academia Sinica, Taiwan --on Mandarin (Tseng 2004, 2005, 2006) • Hirose Lab, Tokyo University, Japan --on Japanese (Hirose and Narusawa 2002, 2003) • Manual extraction—Method used at CUHK to extract Fujisaki parameters • DSPand Speech Technology Lab– on Mandarin (Wentao Gu 2004, 2005)

  4. The Fujisaki Model (Fujisaki & Hirose 1984) log (F0)=base frequency+ phrase components +accent components phrase componentsaccent components superposed model + =

  5. Original F0 contour Auto-extraction based on Mixdorff’s method (2000, 2003) highpass filter (stop frequency at 0.5 Hz) High-frequency contour (HFC) Low-frequency contour (LFC)

  6. Decision of phrase commands Low-frequency contour (LFC) from Mixdorff’s method Position of local minimum optimization The method based on perceptual label- Phonetics Lab, Academia Sinica, Taiwan Perceptual phrase boundary evaluation :

  7. Phonetics Lab, Academia Sinica--Auto-extraction results of Mandarin ( Mixdorff 2003)

  8. Hirose Lab— Auto extraction (Narusawa 2002, 2003) Original f0 contour Derivative-- target of phrase components Residual contour-- target of phrase components

  9. Decision of phrase commands Dynamic Programming (DP) Residual contour The optimum I can be selected when c(I) is maximum.

  10. Hirose Lab—Compensation from text analysis to aid auto-extraction Using parsed text to adjust extracted Fujisaki parameter

  11. Hirose Lab—Auto-extraction of Japanese (Narusawa 2002, 2003) • Original method • An accent component should be located on a phrase component. • New method • Pause is considered. • Correction after using information from parsed text.

  12. Auto-extraction of phrase components—Comparison of 2 labs • Phrase components • Phonetics Lab, IL, AS (modified Mixdorff 2003): Pre-extraction of phrase components--relatively close. • Hirose Lab: Pre-extraction-- not as close, but the final output can be compensated by text analysis. • Auto-extract acoustic signal f0 contour • Compensate the phrase component with parsed text—unit used: bunsetsu (lexical definition)

  13. Manual adjustment--Gu, CUHK • Note: 1. Insertion of phrase components is subjective. 2. Boundary identification is NOT explicitly specified -- perception (duration ? Or f0 reset ?)

  14. Manual adjustment--Gu, CUHK

  15. Possible Future Considerations (1/2) • 1. Distinguishing acoustic feature is only pause? duration? Or f0? • 2. Or combination of acoustic features—pause, duration, and/or f0? • E.g. Test if duration can compensate F0 reset

  16. Possible Future Considerations (2/2) Improvingauto-extraction of tone components • 3. The concept of tone nucleus • By retaining only the nucleus of syllable while ignoring vertical f0 variation (from Hirose’s tone nucleus and Gu’s manual adjustment) • By ignoring horizontal f0 variation (from Gu’s manual adjustment)

  17. One major ambiguity among 3 labs—phrase component unit selection 1. Phonetics Lab, Academia Sinica, Taiwan –Mandarin prosodic phrase (intonation and phrase) 2. Hirose Lab, Tokyo University, Japan – Japanese lexical word (buntetsu) 3. DSP and Speech Technology Lab, CUHK, Hong Kong – Manually selected: PPh—adjusted from visual display PW—adjusted from perceptual decision

  18. Why Prosodic Unit Selection can be a problem unique to Mandarin? Japanese: Bunsetsu--compound word consisting of two or more content words • Mandarin: • Phonetics Lab, IL, AS--Length of prosodic phrase--sometimes too long to maintain the tendency of one application of phrase component function. • 2. HKCU--Manual adjustment can be accurate but not systematic enough. • e.g. A phrase component sometimes corresponds to a prosodic phrase, • sometimes shorter.

  19. Concluding Remarks • 1. Manual adjustment of Fujisaki parameters is more precise but too time consuming. • 2. What possible improvement can auto-extraction borrow from manual adjustment? • Focusing on nucleus (syllable) • Understanding more of acoustic properties (F0, duration…) • 3. More linguistic and cognitive knowledge could help improve prosody model in addition to acoustic information. • Linguistic information—parsing (text analysis and syntax), semantics and pragmatics • Cognitive information---speech planning and processing

More Related