◎ 連續音聲認識 / 理解 CSR : Continuous speech recognition

◎ 連續音聲認識 / 理解 CSR : Continuous speech recognition CSU / SUS : speech understanding system 連續音聲의 特徵 1. 單語間의 境界가 不明確 segmentation의 어려움 2. 單語間의 調音結合文字와의 mismatch 3. 持續時間이 짧아져 發音이 不明確脫落音素의 增加認識率低下 •認識 vs 理解認識 : 모든 單語를 正確히 認識必要理解 : 言語로서의 意味를 把握하여 適切한 對應措置가 可能하면 充分難易度 : 認識 >>> 理解 speech-enabled system (speech aware system)

※ 利用可能한 情報(上位) - 構文情報(syntax) 文章의 構造 에 關한 情報 - 意味情報(semantics) domain에서 使用되는 單語間의 槪念關係, 屬性關係 - 談話情報 (pragmatics) 設定domain 內에서의 常識(common sense) - 文脈情報(context) 使用者와의 對話로 얻어지는 情報 - 韻律情報(prosody) 音聲의 抑揚, 强勢 (單語認識 system에서는 無意味)

•區分單語認識 (isolated word recognizer) word ref. pattern 小語彙(small vocabulary) acoustic analysis word recognition 音聲出力 word lexicon phoneme ref. pattern word recognition phoneme recognition acoustic analysis 音聲出力中 / 大語彙 (medium / large vocabulary) subword recognition phoneme category CV / VC , VCV chain •連續音聲認識 / 理解 (continuous speech recognition / understanding) word modeling syntactic semantic analysis word lexicon phoneme ref. pattern word recognition phoneme recognition acoustic analysis 出力音聲 understanding language modeling subword modeling

CRS system의 處理過程 1. acoustic level(音響分析) 2. phonetic / phonemic level(音聲學 / 音韻學的) feature parameter sequence phonemic sequence로 變換 3. word level 音韻列이나 上位level에서 豫測된 單語와의 對照 : 單語辭典(word lexicon) / 發音辭典(pronunciation lexicon) 利用 4. prosodic level(韻律 level) 韻律情報를 利用하여, 音節, 單語, 句의 境界나 文型을 推定 5. syntactic level(構文 level) 單語列의 構文解析, 單語列前後에 올 수 있는 單語의 豫測 6. semantic & pragmatic level 單語列의 意味解釋, 意味 level 에서의 單語豫測, system 과 user의 對話로부터의 話題推定, 使用되기 쉬운 構文 / 單語豫測

7. control level(制御) •途中結果를 評價하여 最終結果로써 可能性이 큰 候補選擇 • flow control은 至難 : 어느 level에서 어떤 error recovery 가 可能한가? ★ 1-pass? top-down 加味? (feedback)

•音聲認識의 內容別分類 1. 言語情報 : ASR 2. 言語識別 (LID) 3. 方言識別 4. 話者識別 (speaker ID) - 犯罪搜査 5. 感情識別 : 心理狀態 –거짓말 探知機 6. 狀況識別 : 生理 / 健康 / 安全診斷 •ASR의 採算性 1. 作業者의 竝列的作業 : FA 2. 可動性 3. remote entry : 特히 電話와 internet 4. 對話的確認機能 : CNC 等 5. 多樣한 應用 VUI, 工程管理, 話者確認出入統制 Robot, toy, 障碍人用, web search, call center, CTI, CAI •••

◎ 對話管理 system (dialogue management system) Sentence generation Intention analysis speech recognition Speech synthesis 音聲文字列文字列 /speech understanding dialogue management system TTS (text-to-speech) dialogue processing automatic translation 自動飜譯 cf. machine translation (MT) automatic telephony 自動通譯電話 cf. interpretation multilingual speech recognition • LVCSR (read speech) (large vocabulary continuous speech recognition) • spontaneous speech recognition (dialogue speech) dateless問題多樣한 大規模音聲 DB 國家

•代表的인 CSR system CMU JANUS scheduling (ATIS) MIT PEGASUS city guide Germany Verbmobile scheduling(汽車) ATR ASURA conference registration (C-STAR project : 美, 獨, 日 + 伊, 韓) American Airlines United Air

話者認識(speaker recognition) 個人確認技術 ; security •指紋(fingerprint) •署名(signature) •音聲(‘voiceprint’) sonagraph •印章 (seal) •虹彩 (iris) • DNA •얼굴 • PIN (personal identification number) / password •靜脈 biometrics reference pattern threshold ID / PIN (card, key) accept / reject speaker identification / verification decision speech input analysis SV SID

話者認識(speaker recognition) 話者識別 (speaker identification : SID) 話者確認 (speaker verification : SV) SV SID • ID 相對는 指定됨 • ID 候補는 N個 •決定은 Yes or No • N個中 1個로 判定 •한 標準 pattern 과의 比較 (1回) • N個와의 比較(N 回) • error rate는 N과 無關 • N에 比例하여 增加文章獨立(text independent) 話者認識 : 生理的差를 나타내는 feature使用文章從屬 (text dependent) 話者認識 : 音韻性을 利用 ASR과 거의 同一長時間平均 spectrum 平均 pitch low guefrency 生理的特徵 parameter

個人性의 原因 ① 個人에 따른 發聲器官의 解剖學的構造差聲道의 길이, 모양 : 聲帶의 두께나 모양 : 입술의 두께 ••• => 周波數構造上의 差異로 나타남 = 先天的 ② 發聲習慣의 個人差 intonation ; accent ; 말의 빠르기 ; 큰소리 ••• => 周波數構造의 時間的變化 = 後天的注聲帶模寫話者認識硏究의 分類 ① 聽覺에 依한 硏究 어떤 特徵量이 個人性을 잘 表現하는가? (音源의 種類, 聲道 •音源情報中 더 有效한것은?) -> 聽取實驗 ② 視覺에 依한 硏究 spectrogram (sonagraph) reading 犯罪搜査에 利用 Michigan St. Police ③ 機械에 依한 硏究特徵 parameter抽出, P.R. 機械利用

實用上의 問題點 1. 信賴性과 經濟性 voice key로서의 正確度 2. 雜音과 傳送特性電話/通信音聲時重要 3. 詐稱者 (imposter) 對策 4. 疾病等에 依한 音質의 變化具現上基本問題 ① 音韻性과 個人性의 分離 ② 特徵量, 文章(text)의 選擇 ③ 經時的變化定期的 인 update 必要

文章(text)의 例 • Texas Instruments : Duddington CVC chain으로 構成單語使用 false reject 0.3% false alarm 1.0% (false accept) cool birds stopped west small bugs sing down huge twing sang deep strange toads stood wild • Ball Lab : Atal 4%以下 - We are away a year ago - U knew when my lawyer is due. - May we all learn a yellow lion roar • NTT : Furui - namae bakuon baNgoh - kohgen 99% 97~98%(電話)

• BISS (Base & Installation Security & System) • 指紋, 署名, 音聲의 個人確認能力比較 • 1973~1975 美空軍 • 1977 false alarm 2% 以下 • false reject 1% 以下 • time 6sec/person • 音聲만 條件滿足 Calspan Veripen TI 應用 • IR 等에서의 file 保安 : access control • ID / credit card 保護 •出入統制 및 管理 •電話에 依한 個人確認 phone banking, internet application

(free) man-machine communication by (telephone) voice ‘phonetic typewriter’ dictation software ※ ASR + TTS + SV 3-mode system +NLP on telephone line (+CTI) (mobile phone)

話頭 • - voice portal / corporate portal • voice XML • voice user interface (VUI) • voice web browser • multilingual speech processing • call center application • - speech activated (aware) system • ASR + TTS + SV • dialogue management • natural language processing +

◎ 連續音聲認識 / 理解 CSR : Continuous speech recognition

◎ 連續音聲認識 / 理解 CSR : Continuous speech recognition

Presentation Transcript

pliq.me mobile speech-to-text recognition service (russian)

Multiple Indicator Growth Models aka, 2 nd Order Growth Daniel E Bontempo Scott M. Hofer

Free Speech/1 st Amendment

Speech Recognition

Revenue Recognition

Pattern Recognition

Present Continuous

Occupational and Speech Therapy: Treating children with ASD

Why Inner Speech?

Deep Learning from Speech Analysis/Recognition to Language/Multimodal Processing

Laryngeal Function and Speech Production

A Tutorial on Bayesian Speech Feature Enhancement

Language models for speech recognition Bhiksha Raj and Rita Singh

Design and Implementation of Speech Recognition Systems

Multimodal Analysis of Expressive Human Communication: Speech and gesture interplay

Feature Extraction for speech applications

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

Conditional Random Fields for Automatic Speech Recognition

Novel Speech Recognition Models for Arabic

Neural Networks

Architecting for Continuous Delivery

Architecting For Continuous Delivery