1 / 15

Human – Network Voice Interface in A Wireless Era

Human – Network Voice Interface in A Wireless Era. Information–related Activities, Applications and Services in Future Network Era. Future Integrated Networks. Real–time Information weather, traffic flight schedule stock price sports scores. Private Services personal notebook

Download Presentation

Human – Network Voice Interface in A Wireless Era

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human – Network Voice Interface in A Wireless Era

  2. Information–related Activities, Applications and Services in Future Network Era Future Integrated Networks • Real–time • Information • weather, traffic • flight schedule • stock price • sports scores • Private Services • personal notebook • business databases • home appliances • network • entertainments • Intelligent Working • Environment • e–mail processors • intelligent agents • teleconferencing • distant learning • Knowledge • Archieves • digital libraries • virtual museums • Electronic • Commerce • virtual banking • on–line transactions • on–line investments • Multi–media, Multi–lingual, Multi–functionalities • • Cross–cultures, Cross–domains, Cross–regions • • Integrating All Knowledge Systems and Information–related Activities and Services Globally • Multiple User Terminals • telephone set, hand set, PDA, vehicular electronics, home appliance, personal computer, etc.

  3. Wireless Access of Global Multi–media Information • At Any Time, from Anywhere • As Handset Size Shrinks While Required Functionalities Grows and the User Environment Changes, Voice Interface will be Useful for all User Terminals • Examples • voice retrieval,voice browser, voice portal, voice web • spoken dialogue based access to intelligent agents

  4. Scenario for Network Information Access speech information Text-to-speech Synthesis Public Services/ Information/Knowledge text information Spoken Dialogue Information Retrieval Internet speech Private Services/ Databases/ Applications text, image, video, speech, …

  5. Convergence of PSTN and Internet • PSTN(for Voice) and Internet(for Data and Multi-media Contents) are Converging handsets Internet PSTN PCs servers telephones • Driving Force for the Convergence • “anywhere, any time” of wireless services • voice provides the most convenient and natural interaction interface • attractive contents over the Internet • contents(human information) are why the Internet is attractive, while voice directly carries human information • Speech-enabled Access of Web-based Applications

  6. Voice Interface for Human-network Interaction – huge volumes of data disseminated across the globe by optical fiber networks –any time, from anywhere by wireless terminals – vehicular electronics, PDA, handset, home appliance, etc. new platforms accessing the global network information/services – traditional keyboard/mouse not adequate any longer size shrinkage, different user environment, etc. desired functionalities/human–network interactions increasing – voice interface will be one out of the few most important, natural, user friendly, attractive interface – examples: voice retrieval, voice browser, voice portal, voice web voice–based web–user interaction voice–based web tools/Application Interfaces, etc. – voice interface is the only major “missing link” in the “semi–mature” technology chain

  7. Core Technologies / Functionalities for Voice Interface

  8. W X x(t) Feature Extraction Pattern Matching Decision Making unknown speech signal output word feature vector sequence y(t) Y Reference Patterns Feature Extraction training speech Speech Recognition as a pattern recognition problem

  9. Input Speech Feature Vectors Linguistic Decoding and Search Algorithm Output Sentence Front-end Signal Processing Language Model Acoustic Model Training Speech Corpora Acoustic Models Language Model Construction Text Corpora Lexical Knowledge-base ICG Grammar Lexicon Basic Approach for Large Vocabulary Speech Recognition • A Simplified Block Diagram • Example Input Sentence • this is speech • Acoustic Models • (th-ih-s-ih-z-s-p-ih-ch) • Lexicon (th-ih-s) → this • (ih-z) → is • (s-p-iy-ch) → speech • Language Model(this) – (is) – (speech) • P(this) P(is | this) P(speech | this is) • P(wi|wi-1) bi-gramlanguage model • P(wi|wi-1,wi-2) tri-gram language model,etc

  10. Speech Recognition Technologies, Applications and Problems • Word Recognition • voice command/instructions • Keyword Spotting • identifying the keywords out of a pre-defined keyword set from input voice utterances • Large Vocabulary Continuous Speech Recognition • entering longer texts • remote dictation • Speaker Dependent/Independent/Adaptive • Acoustic Reception/Background Noise/Channel Distortion • Read/Spontaneous/Conversational Speech

  11. Lexicon and Rules Prosodic Model Voice Unit Database Output Speech Signal Input Text Text Analysis and Letter-to-sound Conversion Prosody Generation Signal Processing and Concatenation Text-to-speech Synthesis • Transforming any input text into corresponding speech signals • E-mail/Web page reading • Prosodic modeling • Basic voice units/rule-based, non-uniform units/corpus-based

  12. Speaker Verification • Verifying the speaker as claimed • Applications requiring verification • Text dependent/independent • Integrated with other verification schemes input speech Feature Extraction Verification yes/no Speaker Models

  13. speech instruction text instruction 我想找有關新政府組成的新聞? text documents d1 speech documents d2 d1 d3 d2 總統當選人陳水扁今天早上… d3 Information Retrieval Including Voice • Text Documents/Instructions • Speech Documents/Instructions • Voice Personal Notebook/Private Database

  14. Code-Switching Problem English words/phrases inserted in Spoken Chinese sentences 人人都用Computers,家家都上Internet the whole sentence switched to English 準備好了嗎?Let’s go! Cross-language Network Information Processing globalized network with multi-lingual content/users cross-language network information processing with spoken Chinese language input as an example Chinese Dialects/Accents Taiwanese, Cantonese, Shanghainese, etc. hundreds of Chinese dialects code-switching problem─dialects mixed with Mandarin(or plus English) Mandarin with a variety of strong accents Language Dependent/Independent Technologies Multi-lingual Functionalities

  15. Internet Sentence Generation and Speech Synthesis Users Output Speech Response to the user Databases Discourse Context Dialogue Manager Networks User’s Intention Dialogue Server Input Speech Speech Recognition and Understanding Spoken Dialogue Systems • Almost all human-network interactions can be made by spoken dialogue • Speech understanding • System/user/mixed initiatives • Reliability/efficiency, dialogue modeling/flow control

More Related