Interactive Systems Technical Design

Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen ISTD 2003, Audio / Speech

Introduction • When gathering information about surrounding environment, hearing is one basic sense for humans. Therefore, usage of audio and speech as an alternative input and output method can effort a lot to a user experience in interactive systems and make it more natural. ISTD 2003, Audio / Speech

Motivation • Building interactive systems, user interface should behave according to the expectations of the user experiences of the real world. • Generally, user interfaces today are mainly based on keyboard and screen. Feedback from system is given basically only in visual form.In computer-based systems, much better user experience can be achieved by offering information using also other basic senses such as hearing, sense of taste, touch and smell. ISTD 2003, Audio / Speech

Implementation • Basically two components: Audio playback and speech/audio recognition. • Design issues: • Audio can be speech / non-speech • To whom are you designing for? • Different users – different abilities • Blind, old and disabled people • Human diversity – physical, perceptual, cultural and intellectual differences • Mobile computing • Limited input, limited output, slow processor, small memory, limited battery life, slow network connection • Communication protocol • Speech recognition causes major problems • Accuracy • Usage in critical systems? ISTD 2003, Audio / Speech

Applications • MIT Media Lab – Nomadic Radio: Wearable Audio Computing • A client-server based messaging infrastructure • utilizes spatialized audio, speech synthesis and recognition • hourly news broadcasts, voice mail, email, calendar reminders, weather forecasts, stock reports are delivered • HP Labs – SpeechBot • a search engine for audio & video content that is hosted and played from other websites using speech recognition • http://speechbot.research.compaq.com/ ISTD 2003, Audio / Speech

Nomadic Radio Network Architecture ISTD 2003, Audio / Speech

ISTD 2003, Audio / Speech

Strengths / Advantages • Data input possible without keyboard. • Mobile devices • Excellent for hands/eyes busy – situation. ISTD 2003, Audio / Speech

Strengths / Advantages • People with visual or other disabilities • Natural way for humans to interface with the environment • Increase the bandwidth of communication • Devices with limited screen – need for additional output method • Technology available now ISTD 2003, Audio / Speech

Limitations / Weaknesses • Input is error prone especially in noisy environments • Vocabulary size in recognition - Controlling objects and things is limited • Communication protocol needed • “Computer! Shut down the lights!” • Can lead to unnatural experience • How to tell user what communication protocol is like: • Explicit – tell exactly what to say (“Welcome to library, say “XXX” to ...”) • Implicit – open ended, potential for errors (“Welcome to library, what would you like to do….”). ISTD 2003, Audio / Speech

Limitations / Weaknesses • Speech output sounds unnatural • Asymmetrical • speech input is faster than typing • speech output is slower than reading • Feedback & latency • User needs to know if recognition was successful • Is system processing data or waiting input? • Time taken to recognise utterance • Pauses ISTD 2003, Audio / Speech

Selected Industrial Players • IBM • Conversational Biometrics • Combines multiple verification sources such as voice biometrics with spoken knowledge. • Embedded ViaVoice • IBM speech technology to mobile devices • Command and control (C&C) • Text-to-Speech (TTS) • Sony • SDR-4X • Prototype of entertainment robot using multi-modal human interaction technology • Individual person detection by the tone of voice • Continuos speech recognition and unknown vocabulary acquisition • Speech synthesis and singing voice production ISTD 2003, Audio / Speech

SDR-4X ISTD 2003, Audio / Speech

Selected International Research Groups and Projects • The MBROLA Project • Develops speech engine which synthesizes written text for many different languages • Speech Engine core freely available! • http://tcts.fpms.ac.be/synthesis/mbrola.html • Stanford University – Interactive Workspaces • Goal is to create interactive space where you can work collaboratively using natural gestures • http://iwork.stanford.edu/ • Speech Interface Group, MIT Media Laboratory • Major player, numerous projects • Example: Nomadic Radio: Wearable Audio Computing • http://web.media.mit.edu/~nitin/NomadicRadio/ ISTD 2003, Audio / Speech

Selected International Research Groups and Projects • MIT, PROJECT OXYGEN • Pervasive, human-centered computing • Integrated software system that will reside in the public domain • Speech and vision, provide the main modes of interaction in Oxygen. • Multilingual systems support dialog among participants speaking different languages. • The SpeechBuilder utility supports development of spoken interfaces. • http://oxygen.lcs.mit.edu/Overview.html ISTD 2003, Audio / Speech

ISTD 2003, Audio / Speech

Selected Finnish Research Groups and Projects • VTT, Interactive Intelligent Electronics (IIE) • User interface technologies for future home environments, The Smart-Its Project, Beyond the GUI, … • http://www.vtt.fi/ele/projects/iie/index.htm • Helsinki University of Technology, Neural Network Research Centre • Adaptive Natural Language Processing • http://www.cis.hut.fi/projects/natlang/ • Tampere University of Technology, Speech-based and Pervasive Interaction Group • USIX-Interact, Dumas, Mobile User Interfaces, … • http://www.cs.uta.fi/research/hci/spi/ ISTD 2003, Audio / Speech

Companies and Research Groups in Oulu • MediaTeam Oulu, Language and Audio Technology • CBIR – Content Based Information Retrieval • Filling of the Semantic Gap in the Retrieval of Audio and Video Recordings • Multiparametric prosodic analysis of phonetic and phonological correlates of emotions • Vikings • http://www.mediateam.oulu.fi/research/lat/?lang=en ISTD 2003, Audio / Speech

Future Developments • Multimodality • Multilingual, natural speech interaction • Emotional state • Biometrics ISTD 2003, Audio / Speech

Interactive Systems Technical Design