1 / 26

Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight

子計畫五「韻律屬性與語音事件偵測之研究」. Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight. Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan. Outline. Research Direction Introduction Speech materials

keaton
Download Presentation

Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 子計畫五「韻律屬性與語音事件偵測之研究」 Discourse Prosodic Attributes, Boundary Information and Prosodic Highlight Speaker: Jr-Feng Huang PI: Chiu-yu Tseng Phonetics Lab, Institute of Linguistics, Academia Sinica, Taipei, Taiwan NGASR 2011 暑期研習會

  2. Outline • Research Direction • Introduction • Speech materials • Discourse Prosodic Attributes • Analysis of prosodic boundary • Analysis of prosodic highlight • Findings so far NGASR 2011 暑期研習會

  3. Research Direction • Argument • Prosody model • Discourse structure (DS) • Serving to group phrases and utterances to form speech paragraphs and spoken discourse • Information structure (IS) • Serving to realize information weighting in continuous speech Segmental In addition to prosody from segmental, lexical, phonological and syntactic levels; discourse prosody is also an intrinsic part of naturally occurring speech which the human ear is sensitive to, and which cannot be pinned down from analysis of sentence prosody, nor entirely by corresponding text transcription. (Tseng, Interspeech 2010) F0 Lexical Duration Phonological Syntactic Amplitude Discourse Structure Information Structure Abundant Information NGASR 2011 暑期研習會

  4. Introduction • Cues of prosody model • Discourse structure→ Prosodic boundaries • Information structure→ Prosodic highlight (perceived emphasis) • Goals: • Acoustic attributes and discriminative analysis for prosodic boundaries cross genres (Tseng et al, 2008 , 2009) • Seeing how perceived prosodic highlights can be explained by systematic patterns by genre, discourse structure, information weighting acoustic manifestations (Tseng et al, 2011) NGASR 2011 暑期研習會

  5. Speech Materials--Taiwan Mandarin • Read speech • Plain text of 26 discourse pieces by M051 and F051 (CNA)(about 45 and 46 minutes, 160MB) • 34 simulated pieces of weather broadcast by M054 and F054 (WB) (about 23 and 27 minutes, 95MB) • Spontaneous speech • NTU DSP lecture by LSL (one male speaker, about 30 minutes) (SpnL/LEC) NGASR 2011 暑期研習會

  6. Annotations • Preprocessing • Automatic Segmental labeling using the HTK and manually spot-checked for phone boundaries. • Manual labeling of perceived prosodic boundaries by HPG protocols. • Manual labeling of perceivedfocus and prominence • prosodic highlight NGASR 2011 暑期研習會

  7. Annotation Rationale • Labeling Perceived Boundary Breaks • Labeling Perceived Prosodic Highlight (emphasis, accent) NGASR 2011 暑期研習會

  8. Annotations Examples “以自有品牌建立起國際品牌形象” phone boundary layer→ perceived prosodic boundary layer→ perceived prosodic highlight layer→ NGASR 2011 暑期研習會

  9. BG PPh PPh PW PW PW Residues SYL SYL SYL SYL Residues Residues Acoustic Features and Methodology Acoustic features Multiple regression model (Tseng et al 2005) Methodology • Vowel-based F0 • Syllable-based duration • Vowel-based intensity Intrinsic attributes High layer information NGASR 2011 暑期研習會

  10. PW Layer PPh Layer Normalized F0 PG Layer Syllable Position Discourse Prosodic Attributes • Examples: 3-PPh paragraph (Tseng et al, 2010) PG Final PG Initial PG Medial Normalized F0 Syllable Position PG Final PG Initial PG Medial PG Final PG Initial PG Medial Normalized Duration Normalized Intensity NGASR 2011 暑期研習會 Syllable Position Syllable Position

  11. Prosodic Boundary • Phrases are not only major and minor phrases • Acoustic realization of prosodic boundaries • Pre-boundary • F0 lowering, • Duration lengthening • Intensity decay • Boundary pause • Post-boundary • F0 reset • Duration shortening • Intensity jump NGASR 2011 暑期研習會

  12. How Reliable Is Pause Duration ? (1/2) • Cross genres, speakers and language • systematic pattern by pause duration, i.e. B3<B4<B5 Pause duration (ms) by break (B3, B4 and B5 and genre Read Speech (RS) CNA, weather broadcast WB; spontaneous speech (Spnl) NGASR 2011 暑期研習會

  13. Plotting of the distribution of pause duration of discourse boundary breaks B2, B3 and B4 in read speech (RS) CNA for speakers F051P (left) and M051P (right). How Reliable Is Pause Duration ? (2/2) • B3 (PPh) boundaries vary a great deal • Pause duration—not reliable • How is PPh boundary B3 be perceived? • (Tseng et al, 2009) NGASR 2011 暑期研習會

  14. Comparison of Discourse Boundary Discrimination (Tseng et al, 2009) • Cross-feature Comparison by Corpus LEC CNA_F051 CNA_M051 Cross-feature comparison of mean value by corpus (LEC, CNA_F051 and CNA_M051 from top to bottom; the horizontal axis represents indexes of feature type; the vertical axis denotes mean value of each feature). Discrimination: LEC NGASR 2011 暑期研習會

  15. Analysis of Perceived Emphasis Annotations (1/3) • Distribution of Perceived Emphasis Combined Emphasis(E2+E3) NGASR 2011 暑期研習會

  16. Analysis of Perceived Emphasis Annotations (2/3) • Perceived Emphasis Scale • Not only perceived emphasis but syntax constraint NGASR 2011 暑期研習會

  17. Analysis of Perceived Emphasis Annotations (3/3) • Distribution of Perceived Emphasis by phrase boundaries • LEC: post-boundary = pre-boundary • CNA: post-boundary > pre-boundary • WB: post-boundary < pre-boundary NGASR 2011 暑期研習會

  18. Emphasis Loading • Why? • Estimate information weighting in continuous speech • Methodology • Normalize length of PPh • Estimation PPh N Syl Syl Syl Syl Syl Syl Syl Syl Syl Syl Syl Syl NGASR 2011 暑期研習會

  19. Results of Emphasis Loading • Within PPh by Relative Syllable Position • Within BG and PG by Relative PPh Position Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh NGASR 2011 暑期研習會

  20. Acoustic Characteristics of Prosodic Highlights (1/2) • Emphasis vs. no-emphasis without considering PPh-positions • Significant acoustic factors by genres • LEC: • Duration • Average F0 (F-ratio=846) • F0 range • Intensity (F-ratio=873) • CNA • Average F0 (F-ratio=492) • Intensity (F-ratio=364) • WB • Intensity (F-ratio=196) • Duration (F-ratio=170) Mean values of acoustic correlates by emphasis/no-emphasis and genres NGASR 2011 暑期研習會

  21. Acoustic Characteristics of Perceived Highlights (2/2) • Emphasis vs. no-emphasis with considering PPh-positions PPh-Initial • LEC • Duration • Average F0 • F0 range • Intensity • CNA • Average F0 • Intensity • Duration in PPh-Medial position only • WB • Intensity by all PPh positions • Duration in PPh-Medial position only by all PPh positions PPh-Medial by all PPh positions PPh-Final NGASR 2011 暑期研習會

  22. Analysis of Perceived Emphasis by Decision Tree Toolkit • Why? Evaluating the most significant factors for classification • Methodology: • Results: Decision Tree-LEC Decision Tree-WB Decision Tree-CNA NGASR 2011 暑期研習會

  23. Discourse Pattern of Emph vs. No-Emph—CNA CNA CNA Normalized Duration Normalized Duration Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Removing emphasis effect Normalized F0 Normalized F0 Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Normalized intensity Normalized intensity Initial PPh Medial PPh Final PPh NGASR 2011 暑期研習會 Initial PPh Medial PPh Final PPh Syllable position

  24. Discourse Pattern of Emph vs. Non-Emph—WB WB WB Normalized Duration Normalized Duration Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Removing emphasis effect Normalized F0 Normalized F0 Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Normalized intensity Normalized intensity Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh NGASR 2011 暑期研習會 Syllable position

  25. Discourse Pattern of Emph vs. Non-emph —LEC LEC LEC Normalized Duration Normalized Duration Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Removing emphasis effect Normalized F0 Normalized F0 Initial PPh Medial PPh Final PPh Initial PPh Medial PPh Final PPh Syllable position Normalized intensity Normalized intensity Initial PPh Medial PPh Final PPh NGASR 2011 暑期研習會 Syllable position Initial PPh Medial PPh Final PPh

  26. Findings • Prosodic boundary • Pause duration could be random • Boundary neighborhood contrast is more significant. • Prosodic highlights • Speech mode (genre) related • Independent of discourse structure • underlying linguistic structures can be derived • Future directions • Speech technology development could benefit from more understanding of information structure in relation to prosodic highlight. NGASR 2011 暑期研習會

More Related