140 likes | 385 Views
語音資料之處理與標記. 曾淑娟 中央研究院語言學研究所籌備處. 語音資料類型. 新聞側錄語音 (broadcast news data) 麥克風語音 (microphone speech data) 電話語音 (telephone speech data) 實驗室語音 (lab speech data). 語料類型 ─ I. 朗讀語料 (read speech) 朗讀、 公眾人物完全依講稿內容演講 準備性語料 (prepared speech) 公眾人物依講稿內容背誦演講、記者採訪、談話性節目主持人
E N D
語音資料之處理與標記 曾淑娟 中央研究院語言學研究所籌備處
語音資料類型 新聞側錄語音 (broadcast news data) 麥克風語音 (microphone speech data) 電話語音 (telephone speech data) 實驗室語音 (lab speech data)
語料類型 ─ I 朗讀語料 (read speech) 朗讀、 公眾人物完全依講稿內容演講 準備性語料 (prepared speech) 公眾人物依講稿內容背誦演講、記者採訪、談話性節目主持人 自發性語料 (spontaneous speech) 平時交談、無準備談話
語料類型 ─ II 獨白 (monologues) 朗讀講稿(reading)、敘述事實或故事(narratives) 對話 (dialogues) 記者採訪(interview)、談話性節目對談、兩人對話 多人會話 (conversations) 兩人以上交談、兩人以上談話性節目對談
數位錄音─硬軟體設備 錄音機械: Digital Audio Tape(DAT)需轉錄成聲音檔案 -- > Master Tape Mini Disc (MD) 需轉錄成聲音檔案 錄音軟體 (Speech Analyzer, PCQuirer, Cool Edit Pro) 直接錄成聲音檔案 麥克風: 單一指向性 (uni-directional) (Audio Technica, AKG, Sennheiser, Shure) 錄音場所: 普通房間、 錄音間、 戶外 錄音情境: 對談、 訪問、 獨白/敘述、操作預先設計任務
數位錄音─格式 取樣品率(sampling rate): 8kHz 、 16kHz 、 44.1kHz 、 48kHz 取樣大小: 8 bits、16 bits (位元) 聲道: 單聲(mono)、立體聲(stereo) 檔案格式: pcm、wav、ptk、sd
語音資料─meta data 內容 檔頭(header): 錄音地點、錄音日期、語音類型、語言、取樣品率、錄音格式 語音內容(transcripts): 所屬聲音檔、 發音人資料(編號、 年齡、 性別)、漢字內容轉寫、拼音內容轉寫 註記(comments): 單筆資料註記
範例─架構 Header -- record place -- record date -- speech type1 -- speech type2 -- language -- sampling rate -- record type Body -- voice segment -- voice segment -- wave filename -- speaker info -- start time -- end time [.wav] [MISC-n-age-gender] [msec] [msec] -- transcriber info -- character transcription -- Pinyin transcription [name] Big5, foreign words, Pinyin, foreign words, markers/particles, @, markers/particles, tags: <name></b name> pronunciation: [ ], @, tags: <name></b name> -- comment --
範例─實際格式 <recordplace>Taipei, Taiwan <recorddate>June 3, 2001 <speechtypei>spontaneous <speechtypeii>dialogue <language>Mandarin <samplingrate>48 kHz <recordtype>stereo <segment> <voicefile>d:\分割完成的檔\stereo_01\mcdc-01-01.wav <speaker>MISC-08-male-25 <start>000000 <end>009514 <translator>Fen <chinese> <b particle>EI </b particle><b clear throat>@</b clear throat>你好我姓賴請問一下貴姓<b hiccup>@</b hiccup> <b breathe>@</b breathe> </chinese> <english> EI @ ni3 hao3 wo3 xing4 lai4 qing3 wen4 yi2 xia4 gui4 xing4 @ @ </english> <comment> </comment> </segment>
範例─語音內容轉寫與標記 Character Transcription 蓋章認可<b inappropriate pronunciation>的</b inappropriate pronunciation><b short break>@</b short break>只有<b assimilation>三分</b assimilation>之一<b inhale>@</b inhale><b marker>NA </b marker>其它的<b clear throat>@</b clear throat><b exhale>@</b exhale><b assimilation>三分</b assimilation>之二是<b inhale>@</b inhale>警察局自己<b pause>@</b pause><b inappropriate pronunciation>就</b inappropriate pronunciation><b inappropriate pronunciation>是</b inappropriate pronunciation> Pinyin Transcription gai4 zhang1 ren4 ke3<b inappropriate pronunciation>de5</b inappropriate pronunciation><b short break>@</b short break>zhi3 you3<b assimilation>san1 fen1</b assimilation>zhi1 yi1<b inhale>@</b inhale><b marker>NA </b marker>qi2 ta1 de5<b clear throat>@</b clear throat><b exhale>@</b exhale><b assimilation>san1 fen1</b assimilation>zhi1 er4 shi4<b inhale>@</b inhale>jing3 cha2 ju2 zi4 ji3<b pause>@</b pause><b inappropriate pronunciation>jiu4</b inappropriate pronunciation><b inappropriate pronunciation>shi4</b inappropriate pronunciation>
Dialogue Act Annotation (MTCC) general opening: opening negotiating a topic: suggest_topic accept_topic reject_topic comment_topic introducing a topic: introduce_topic talking about the topic: begin_statement agree agree_part oppose oppose_part feedback_understanding feedback_non_understanding backchannel question question_request_answer answer exclamation rephrase clarify correct repeat completion_by_self completion_by_other comment_by_self comment_by_other ending the topic: end_topic general closing: closing uninterpretable fragments: not_classified
Taxonomy of Spontaneous Speech Phenomena (MCDC) • Disfluency: prosodic disfluency (silence, pause, short break, stutter), repair (restart, repetition, overt repair, editing term, error, word fragment), lexico-syntactic disfluency (inappropriate, interrupted, abridged utterances), discourse particles and discourse markers (both transcribed in capital letters) • Socio-linguistic Phenomena: code switching, dialect-influenced pronunciation, new words • Particular Vocalisation: lengthening, assimilation, syllable contraction, inappropriate pronunciation • Unintelligible and Non-speech Sounds