260 likes | 363 Views
Development of conversational interfaces at Nokia Research Center. Boda Péter Pál peter.boda@nokia.com Language Technology & Applications, Voice Interfaces Group Speech and Audio Systems Laboratory Nokia Research Center 14 October, 2002. Contents. Background personal
E N D
Development of conversational interfaces at Nokia Research Center Boda Péter Pál peter.boda@nokia.com Language Technology & Applications, Voice Interfaces Group Speech and Audio Systems Laboratory Nokia Research Center 14 October, 2002
Contents • Background • personal • Language Technology and Applications group at NRC • A commercial implementation: Nokia One Voice Service • Overview of CATCH-2004: multilingual conversational interface • Demos • Summary
Personal background • Born in 1965, Miskolc, Hungary • M.Sc. in Telecommunications, 1991, Budapest, Tech. Univ. of Budapest • Post-graduate studies: TUB 1991-1994, HUT 1992-1994, Nijmegen 1995 • Lic. Tech. Speech Technology and Neural Networks, 1995, Helsinki, HUT • Working on • speech analysis 1990-1995 • speech recognition 1995-1997 • spoken dialogue systems, language technology 1996- • Interest: • Natural Language Understanding (semantic decoding) • Dialogue Management • Processing multimodal and contextual input
Language Technology and Applications • Mission: develop language technology for Nokia’s offering • Dialogue-based application development for telecommunication (mainly network-based implementations) • Seamless integration of Natural Language Understanding technology to user interfaces • Covering the entire development process: • conceptual design • data collection and analysis • grammar building and tuning, NLU training & testing • Wizard-of-Oz experiments • type-in and speech-enabled tests • objective and subjective evaluation • human factors consideration, usability studies Personnel: a diverse team of linguists, software and telecomm engineers
What will new generation of speech interfaces bring? • Enhanced usability: - naturalness in terms of linguistic expressions; - ease of use; - human-human like dialogues; - accelerated system-user interactions; • Well-defined framework to port to other languages & tasks : - end-to-end solutions (design, data collection, Wizard-of-Oz studies, implementation, test, assessment); - shortened development cycle (development tools).
A commercial implementation: Nokia One Voice Service http://www.nokia.com/nokiaone
Features DTMF and speech access (language of the user interface is English) dialogue-based implementation with mid-complex task grammar functionalitites: browsing e-mails selecting for reading send in SMS reply with voice clip accurate language identification text-to-speech (TTS) for several languages when reading back e-mails English, Finnish, Italian, French, German, Spannish e-mail preprocessors prior to TTS usability studies show that the speech version is more popular now than the DTMF version Speech interface for e-mail reading
Some general comments • Before implementing any speech interface: • think about its role: replacement or addition? • if addition, how it will help/complete the current user interface • is there any real added value it can bring? – acceleration, security? • think carefully the efforts you need to develop a solution • amount and ratio of research and implementation • never underestimate the results of user/usability tests – go for real • TTS is important, users comment primarily that and not the recognition part. TTS can mean language technology, as well.
An EU project: CATCH-2004 – Converse in AThens-2004, Cologne, Helsinki http://www.catch2004.org/
A multi-multi-multi project …. Jan 2000-June 2002 30 months 7 partners 5 countries 603 Person-Months 6.5 M€ (3.25 from EC) 2 demonstrators : Athens, Helsinki 1 tester: Cologne 16 deliverables 11 milestones
Consortium Finland France, Germany, Greece, Czech Republic Germany Greece Gerhard-Mercator Universität Duisburg NTUA
Overview • The "flag-ship" of the 5th EU-IST programme • Objectives: • conversational interface to (city) information services: build various applications, possessing high performance accuracy and satisfying requirements set for well-functioning spoken dialogue systems • multilingual (Finnish, English, German, Greek) • multidevice (kiosk, phone, smart wireless) • multimodal (GUI, speech) • Internet infrastructure (WAP, VoiceXML, remote databases) • Nokia's role: • WAP access • Multimodal browsing • NLU development for Helsinki demonstrator • Helsinki demos: • 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit • 2001: Program Guide Information Service - has relevance to other project
Inside the NLU module Database Natural Language Understanding (NLU) incl. Dialogue Manager Speech recognition Speech synthesis
What does NLU module do? (1) Interprets the meaning of the user utterance and decides what to do with the utterance. (2) Interacts with the backend database (3) Decides what kind of answer will be provided • The NLU toolkit employed in CATCH-2004: • IBM ViaVoicePhone Telephony Natural Language Tools • Statistical approach • The speaker is not restricted to any particular vocabulary or commands but can freely express the request by using natural language expressions.
The components of NLU module • The NLU module contains four main components. Sequence of words, as the LM allows. Output of the recogniser Statistical Classer Extracts the key concepts of the utterance. Transforms certain concepts to a form which is understood by the backend database. Canonicalizer Determines what to do with the key concepts from the classer. Statistical Parser Directs the interaction between the user and the system. Dialog Manager
Multilingual Architecture Speech recognition NLU Multilingual classer Multilingual TASK Multilingual parser (Lang ID) Multilingual LM/Voc Canonicalizer Dialog manager Multilingual AM Answer generation (language-dependent TTS) LM language model Voc vocabulary AM acoustic models TTS text-to-speech Lang ID language identification
Historically speaking …. • Helsinki demos: • 2000: Art-Goes-Kapakka - just to experiment the NLU toolkit • 2001: Program Guide Information Service – more realistic • AGK • developed as the first NLU application at Nokia • good exercise to walk through (with sweat) the entire development process • strict co-operation with IBM, regular consulting • results were comparable to others • easiness: handlable size & complexity, (almost) available database • PGIS • we wanted a more real-life application • Electronic Program Guides are coming to use as digital TV spreads • on-going standardisations (MPEG-7 -> program types and sub-types)
Supported functionalities in PGIS • A LIST based on the following parameters: • A QUERY about a particular program: DATE PROGRAM NAME PROGRAM TYPE TIME LANGUAGE PERFORMER CHANNEL PRICE NEW DATE YEAR PRICE TIME COUNTRY OF ORIGIN RESTRICTIONS DURATION EPISODE TITLE DESCRIPTION CHANNEL WEB ADDRESS PERFORMERS RE-RUN LANGUAGE PEOPLE BEHIND THE PROGRAM PROGRAM TYPE SUBTITLES
Gaining better performance via iterative tests error rate English system automatic speech recognition (word error rate)
A real (!!!!) dialogue with PGIS S: Welcome to Program Guide. What would you like to know? U: Could you please tell me about moviestonight? S: I found the following movie programs on Monday between 6 pm and 12 pm: It's Always Fair Weather, The Murder Men, Boom Town, and The Hook. U: What time is the first one? S: It's Always Fair Weather starts at 5 past 5 pm. How else can I help you? U: How long does it last? S: It's Always Fair Weather lasts 1 hours 40 minutes. What information can I offer you next? U: Who are the actors? S: It's Always Fair Weather. The actors are Gene Kelly, Dan Dailey, Michael Kidd, Cyd Charisse. How else can I help you? U: Can I have a description for it please? S: I can offer you the following type of description about the program called It's Always Fair Weather. The unbeatable musical team of Kelly and Donen is behind this zappy musical about three serviceman buddies reunited after 10 years…... What else can I do for you? U: That's all for now. Good-bye. S: Thank you for using the Program Guide. Good-bye.
Welcome to Program Guide! How may I help you? Movies tonight? NEW CONTEXT! Michael Douglas is in Coma … Channels? I found the following programs … Movies with Michael Douglas? BBC World, CNN, Eurosport, TCM …. starting time of the 1st? NEW CONTEXT! NEW CONTEXT! What’s on BBC World tonight at 10pm? … sorry, no programs for youngsters (in Finnish) … it starts at 5.15pm World News at 10pm …. duration? (in Finnish) Programs for youngsters? (in Finnish) …. duration? (in Finnish) … it is 1h 25min long (in Finnish) NEW CONTEXT! … it takes 5 minutes (in Finnish) …. description? What kind of info I can offer next? (in Finnish) That’s all for now. I can offer the following description …. To text message? … no text message, thanks. (in Finnish) Good bye!
What lessons have we learnt? • In general: • Research project has its own difficulties – risk must be taken but within limits; • Know your partners, their capabilities and be initiative in co-operation; • Strong dependency on one partner’s technology might be problematic; • About technology • Good to have linguists around, although many of the development phases require engineering skills; • Everything should be planned as precisely as possible, even tests and evaluation methods; • The best results are gained with successive test-evaluation-improvement cycles; • This kind of technology is quite new the users often don’t know the possibilities of the system, therefore the instructions must be very guiding and clear: • difficult if only a demo system available with fake database, without comparable traditional system; • test users must be awarded – very crucial, otherwise no motivation • The real picture about system functionality and operability can be gained only from real users in real situations.
Finally …. Gábor Dénes (1969): "If enough people work hard enough on the problem of speech recognition, it will be solved by mid next century."
References • http://www.nokia.com/nokiaone • Oria, D. & Koskinen, E., ”E-Mail Goes Mobile: The design and implementation of a spoken language interface to e-mail” – ICSLP’2002 • http://www.catch2004.org/ • Harrikari, H., M. Mast, T. Ross & H. Schulz: 2002, “Different Approaches to Build Multilingual Conversational Systems”. 5th International Conference on Text, Speech and Dialogue, TSD 2002, Brno, Czech Republic. • Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002a, “CATCH-2004 Multi-Modal browser: Overview Description with Usability Analysis”. IEEE 4th International Conference on Multi-modal Interfaces, Pittsburgh, PA, U.S.A. • Kleindienst J., L. Seredi, P. Kapanen & J. Bergman: 2002b, “Loosely-coupled approach towards multi-modal browsing”, Submitted to Universal Access in Information Society magazine’s special issue on Multi-modal User Interfaces. • Boda, P. et al.: “Subjective Evaluation of a Personalised Conversational Interface to a Program Guide Information System” – Submitted to the User Modeling and User-Adapted Interaction journal (UMUAI) Special Issue on User Modeling and Personalization for Television.
AM ASR CTI DM LM NLU SUI TTS VVT WOZ acoustic model automatic speech recognition computer-telephone integration dialogue manager langauge model natural language understanding speech user interface text-to-speech synthesis ViaVoice Telephony (IBM's speech resources) wizard of Oz Abbreviations