570 likes | 574 Views
Explore Mandarin prosody through a speech database, labeling prominence and focus in connected speech. Establish research environments and tools.
E N D
Labeling Emphasis in Continuous Mandarin Speech: Preliminary Design and Results Chiu-yu Tseng Institute of Linguistics (Preparatory Office) Academia Sinica Taipei, Taiwan
Mandarin Prosody Investigation through Speech Database: Two Phases • Phase I. Investigating breaks/pauses in connected speech • Phase II. Investigating prominence/focus in connected speech
Goals: • 1. To define (or re-define) prosody from a corpus perspective • 2. To build up Mandarin prosodic organization from collected speech database. • 3. To develop tools for Mandarin prosody • 4. To integrated speech investigation with other text and/or speech related investigation
Sub-project(I)--Computer Processing of Han Characters(漢字文字處理) • Sub-project(II)--Chinese Corpora(中文語料庫) • Sub-project(III)--Chinese Lexical Knowledge-base(中文詞彙知識庫) • Sub-project(IV)--Chinese Sentence Parsing(中文文句剖析) • Sub-project(V)--Mandarin Speech Database(國語語音資料庫) • Sub-project(VI)--Chinese Information Retrieval(中文資訊檢索) • Sub-project(VII)--Mandarin Speech Processing(國語語音處理)
Major Directions and Roles of the Subprojects Academic Prototype Systems Demonstrating the Accomplishments Text/Spoken Dialogue Systems Dictation/Text-to-speech Systems Information Retrieval Systems (I) Computer Processing of Chinese Character (I) Computer Processing of Chinese Characters (II) Chinese Corpora (II) Chinese Corpora (III) Chinese Lexical Knowledge-base (IV) Chinese Sentence Parsing (III) Chinese Lexical Knowledge-base (VI) Chinese Information Retrieval (V) Mandarin Speech Databases (VII) Mandarin Speech Processing Establishment of Necessary Research Environments Basic Research in Chinese Information Processing
Feature • Combine Basic Research and Developing Prototype Systems • Develop Inter-disciplinary Collaborative Research
Objectives Looking for features in Mandarin Chinese speech Designing labeling tools in accordance with the TobI system (layered labeling) • long-term goal: establishing systematic analysis of prosody in Mandarin • current goal: how is connected speech grouped by different prosody properties
Design of Mandarin Speech Database • Phonetically balanced databasePhonetically rich databaseProsody oriented database • Read speech from text • Spontaneous conversation
Data (Types of Speech) • Microphone speech • Telephone speech
Collecting planned/read speech • Large amount of data from relatively small number of subjects • e.g. 6 S’s, 7 hrs of speech each, 42 hrs total • Large amount of data from large number of subjects • e.g. 100 S’s, 1 hr of speech each, 100 hrs total
Speech Data Used for the Present Study • 1600 read utterances (up to 140 syllables per utterance in duration) by 3 males and 3 females • 3441MB of digitized data • 3 sentence types: • Declarative • Exclamatory • Interrogative
A new approach to investigate Mandarin prosody: • Moving one step away from phrases and sentences • Taking a larger scope/coverage • Intonation vs. Prosodic group
感嘆句(exclamatory sentence) • 疑問副詞感嘆句(interrogative adverbs: 多麼、多、怎麼、那裡、哪裡 ) • 一般副詞感嘆句(exclamatory adverbs:真、真是) • 疑問語助詞感嘆句(interrogative particles:呢、吧、哪) • 一般語助詞感嘆句(exclamatory particles:了、啊、呀、喔、喲、哦、哈、嘛、囉、哩、啦、耶、哇、的)
疑問句(question) • 疑問詞問句(interrogative-word question:怎麼樣、怎麼、怎樣、怎、哪、為什麼、什麼、誰、幾、多少、如何、何) • 正反問句(A not A question:是否,是不是) • 推測疑問句(難道、莫非、真的) • 其他推測疑問句
Prosody may interact with • Sentence/phrase types? • Syntactic boundaries? • Physiological constraints
physical aspects (measurable) • speaking rate • volume • pitch (perceived F0) • speaking range
perceptual aspects: not measurable • breaks • emphases/focus/prominence
Tentative Designs of Labeling System for Prominence and Focus in Continuous Speech Attempting to label prominence • 1. As perceived emphasis only • 2. As combination of emphasis and focus • 3. By Separating prominence and focus
Experiment I • Labeling prominence as perceived emphases, using rationales for breaks • Samples: 100 utterances • Transcribers: 3 • Tentative labeling system: 5-step • E1: normal • E2: moderate • E3: strong • E4: very strong
Preliminary Results (2 transcribers) • Note that only the emphatic (stressed?) • portion(s) within each utterance is(are) labeled. • Difficult to quantify the labeled E’s (emphases) to prominence to acoustic properties such as duration, amplitude, etc. • Hard to achieve consistency between transcribers (always the problem!)
Experiment II • Attempting to separate prominence into 2 layers, I.e., perceived prominence (P) and emphasis (E) and labeling these 2 layers independently • P: perceptually identifiable prominent portion within an utterance • E: 4 degrees of emphasis as in Exp. I.
E1: reduced emphasis • E2: normal emphasis • E3: moderate emphasis • E4: strong emphasis • Samples: 100 utterances (different from Exp. I) • Note: these utterances were already labeled for perceived breaks/pauses • Transcribers: 3
4.該主管表示,中東危機導致國人擔憂國內油品4.該主管表示,中東危機導致國人擔憂國內油品
5.而市政府竟然忘了,現在勢成騎虎確是咎由自取。5.而市政府竟然忘了,現在勢成騎虎確是咎由自取。
將以普騰的品牌形象來替代建弘電子公司名稱,將以普騰的品牌形象來替代建弘電子公司名稱,
Example of labeling comparison between 2 transcribers: • Difficult to achieve inter-transcriber consistencies, but intra-transcriber consistency became better • Reason: domain of emphasized speech string was undefined
7.此外,在計算公債利息負擔時,利率訂為百分之十二,7.此外,在計算公債利息負擔時,利率訂為百分之十二, 忽視了目前利率水準走低的趨勢,也不符實際。
Table 1 shows the comparison of perceived prominence, emphasis and focus labeling between two transcribers. A total of identical 100 utterances was labeled by each transcriber.
Table 2 shows the comparison among three transcribers. Each transcriber labeled the same 50 utterances for perceived prominence and focus. Note that an upper ceiling of 4 syllables was imposed on labeling the domain of perceived prominence.
Table 3 shows the comparison among three transcribers. Each transcriber labeled an additional 100 utterances for perceived prominence and focus.
事前未經整體規劃,且太過偏重硬體工程建設。事前未經整體規劃,且太過偏重硬體工程建設。
Experiment 3 (Current state) • Speech data labeled: 1106 utterances by 3 transcribers.
Labeling system developed • Features: • 1. Developing a prominence labeling system in accordance with already developed break labeling system. • 2. Restricting the domain of perceived prominence to at most 4 syllables