240 likes | 403 Views
Information Retrieval using Intelligent Speech Communication Interface. Institute of Informatics of the Slovak Academy of Sciences, Bratislava trnka@savba.sk. Overview. Introduction IRKR system Architecture Pilot applications Realization of service.
E N D
Information Retrieval using Intelligent Speech Communication Interface Institute of Informatics of the Slovak Academy of Sciences, Bratislava trnka@savba.sk
Overview • Introduction • IRKR system • Architecture • Pilot applications • Realization of service WIKT 2006
WIKT 2006 What is a Speech Communicarion Interface (SCI)? • A SCI, or Spoken Language Dialog System (SLDS) is a computer system that you can talk to in order to carry out some task • Contemporary SLDSs are typically of two kinds: • Transaction-based systems, allowing to undertake some transaction, such as buying or selling stocks, or reserving a seat on a plane • Information-provision systems, providing information in response to a query, such as a request for timetable information or weather information • The circle of typical speech dialog in SCI shows also main components of SCI
Action Meaning ORIGIN_CITY: KOŠICE DESTINATION_CITY: BRATISLAVA FLIGHT_TYPE: ROUNDTRIP GET DEPARTURE DATE WIKT 2006 The Speech Dialog Circle in SLDS Speech Speech TTS Automatic SpeechRecognition Text-to-Speech ASR Data, Rules Words spoken ”I need a flight from Košice to Bratislava roundtrip” Which date do you want to fly from Košice to Bratislava? Response Generation Spoken LanguageUnderstanding RG SLU DM DialogManagement
IRKR • first SLDS which is able to interact in the Slovak language • developed in the period from July 2003 to June 2006 • supported by the National program for R&D “Building of the information society” WIKT 2006
WIKT 2006 IRKR - partners • Technical University of Košice • Institute of Informatics, the Slovak Academy of Sciences • Slovak University of Technology in Bratislava • University of Žilina
WIKT 2006 IRKR - specification • natural interaction • multi-user interaction • slovak language • fixed and mobile telephone networks • access to distributedinformation(on internet)
IRKR - architecture • DARPA Communicator architecture • ‘hub-and-spoke’ • each module seeks services from and provides services to the other modules • modules communicate with them through the central software router - the Galaxy hub • communicator.sourceforge.net WIKT 2006
WIKT 2006 Galaxy – basic overview • Distributed, message-based, hub-and-spoke infrastructure optimized for constructing spoken dialogue systems; • available under a liberal open source license; • not an end-to-end dialogue system, but provides tools for constructing such a system out of a suite of servers; • provides a sophisticated and general transport layer for connecting servers and Hubs, as well as a message syntax (does not provide specifications about semantics); • the core Galaxy Communicator infrastructure is written in C; • support for defining server and connection initialization functions in C, Python, Java and Allegro Common Lisp.
WIKT 2006 IRKR - architecture
WIKT 2006 Automatic speechrecognition server • conversion of incoming speech to a corresponding text • two speech recognizers of freely available for nonprofit research • ATK - htk.eng.cam.ac.uk/develop/atk.shtml • SPHINX - cmusphinx.sourceforge.net • Phoneme acoustic models: • built following REFREC 0.96 training procedure • acoustic features were conventional 39-dimensional MFCCs, including energy and first and second order deltas • 3-state left-to-right HMMs • context dependent (triphone) acoustic models
WIKT 2006 Databases used for ASR training • SpeechDat-E SK • 1000 speakers, PSTN (office, home, phonebooth) • MobilDat SK • 1100 speakers, GSM networks (office, home, street, vehicle, public building) • Both of them balanced for: • age, regional accent, and sex of the speakers • Every speaker pronounced 50 files - numbers, names, dates, money amounts, embedded command words, geographical names, phonetically balanced words, phonetically balanced sentences, Yes/No answers and one longer non-mandatory spontaneous utterance
WIKT 2006 Text-to-speech synthesis • TTS converts outgoing information in text form to speech • intelligibility, naturalness • we developed two TTS modules using two different approaches: • diphone • intelligible speech • flexible and totally domain–independent • computationally inexpensive • small memory-footprint • sounds a bit robotic and tedious • unit-selection • better naturalness • some problems with intelligibility • limited domain
WIKT 2006 TTS architecture Diphone synthesizer Unit selection synthetizer
WIKT 2006 Dialogue manager • The dialogue manager controls the dialogue of the system with the user • The heart of the dialogue manger is the interpreter of VoiceXML mark-up language: • simplifies speech application development • enables distributed application design • accelerates the development of interactive voice response (IVR) environments
WIKT 2006 Dialogue manager architecture
WIKT 2006 Audioserver • provides the whole information system with reliable multiuser connection to the telephone networks • supports telephone hardware - Dialogic D120/41JCT-LSEuro card • The direct (broker) connection between audio server and ASR server or TTS server
WIKT 2006 Dialogue manager architecture
WIKT 2006 Information server - IS • IS connects the system to information sources and retrieves information required by the user • special IS for every pilot application – special web wrapper • a rule based ad-hoc IS searching only several predefined web-servers with a relatively well known structure of pages will do a much better job • returning the data in the XML format • caching of results with user defined expiration
WIKT 2006 IS architecture
WIKT 2006 WEB wrapper • navigation through the web-server • extraction from the web-pages • mapping on to a structured format (XML) • data verification • robust as possible against changes in the web-pages structure
Pilot applications • “Weather forecast in Slovakia“ • www.meteo.sk; www.shmu.sk • weather forecast for about 80 Slovak district towns • Place: District town or holiday locality • Date: relative date / accurate date • „Timetable of Slovak Railways“ • www.cp.sk • information about Slovak railways timetable • Starting place: railway station in Slovakia • Destination place: railway station in Slovakia • Date: relative date (today, tomorrow etc.)/absolute date (“the twentieth of December” etc.) • Time: departure time (hour, minute) WIKT 2006
WIKT 2006 Realization of services • available at: +421 55 602 2297, +421 2 5941 1118 (T-com), +421 911 650 038 (T-Mobile), +421 918 717 491 (Orange), irkr_pub (skype) • IRKR on web - irkr.fei.tuke.sk Here we show a typical dialogue between the user (U) and the system (S): S: Welcome to the IRKR portal. Would you like to play the introduction? U: No. S: Choose one of the services: Weather forecast or Railway’s timetable. U: Weather forecast S: Please, name a city and assign a day, for which you want to get the weather forecast. U: Bratislava, tomorrow. S: Did you say Bratislava, tomorrow? U: Yes S: The weather forecast for Bratislava for tomorrow is: sunny, 32 centigrade...
Thank you for your attention WIKT 2006