1 / 24

A Spoken Dialog System to Access a Newspaper Web Site

A Spoken Dialog System to Access a Newspaper Web Site. César González Ferreras (UVA) Rubén San-Segundo Hernández (UPM) Valentín Cardeñoso Payo (UVA). Dialog Systems based on XML Technologies Berliner XML Tage 2004. Universidad Politécnica. de Madrid. Contents.

Download Presentation

A Spoken Dialog System to Access a Newspaper Web Site

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. A Spoken Dialog System to Access a Newspaper Web Site César González Ferreras (UVA) Rubén San-Segundo Hernández (UPM) Valentín Cardeñoso Payo (UVA) Dialog Systems based on XML Technologies Berliner XML Tage 2004 Universidad Politécnica de Madrid

  2. Contents • Introduction and Related Work • System Overview • System Architecture • Interaction Model • Information Model • Sample Interaction • Conclusions and Future Work

  3. Introduction • Provide vocal access to already existing Internet contents. • Advantages of Vocal Interaction over traditional visual only web browsing: • Speech is more natural for most of the people. • Suits for users with special needs (e.g. blind people) • Ideal for hands-free, eyes-busy environments. • Solution for mobile devices which allow web access anytime anywhere, but still have limited displaying capabilities.

  4. Introduction • Maturity of spoken dialog systems for accessing structured information stored in databases [La99, Zu00]. • Textual information is massive and speech interface has some limitations (sequential and not persistent). • An efficient and natural way of interaction is required.

  5. Related Work • Approaches to make web contents available using speech: • Add a vocal interface to an existing web browser, [HT95, Ve03]. • Convert HTML contents into VoiceXML, [Go00, FKL01]. • Restrict the the solution to selected on-line resources [La97, PCS03]. • Extend a traditional Information Retrieval System with a speech interface [Cr99, Ch02].

  6. System Overview • Objective: develop a spoken dialog system to access a newspaper web site. • We use two strategies to access information: • Browse: review which information is available. • Query: specific information need. • To describe each strategy, we use two models: • Interaction model: describes how the system dialogs with the user. • Information model: describes how the web contents must be processed and structured in order to support that interaction.

  7. TREE ... ... ... Browse • Browse: the user does not have a specific information need and wants to know which information is available. • Interaction Model: The information must be presented gradually, at different levels of detail. • Information Model: The information must be organized in groups of items, and all the items in different levels of detail: first a headline, next a short description and finally all the information.

  8. INDEX term1 term2 term3 term4 ... Doc1 Doc2, doc3 Doc4 Doc3 ... Query • Query: the user has a specific information need which he can express as a query. • Interaction Model:The system searches and presents the results to the user. • Information Model: An inverted index is used. It contains, for each term in the lexicon, a list of documents in which that term appears. We have used the vector space model, [SWY75].

  9. Local Repository Crawler Internet INFORMATION MODEL TREE ... Dictionaries Information Manager ... ... INDEX term1 term2 term3 term4 ... Doc1 Doc2, doc3 Doc4 Doc3 ... VoiceXML Browser Dialog Manager System Architecture

  10. System Architecture • Information Manager: • HTML pages are converted into XML using Tidy and XSLT. • Browsing tree is built (based on sections and news). • Inverted index is built. • Dialog Manager: • VoiceXML is used as language to describe dialogs. • Java Servlet technology (Tomcat). • VoiceXML Browser: • The system works for Spanish Language. • Our own VoiceXML interpreter. • Speech recognition and synthesis from Universidad Politécnica de Cataluña. • Dialogic telephone card.

  11. Interaction Model • System initiative strategy to control the dialog flow (Finite State Diagrams mapped into VoiceXML). • A large vocabulary (2000 words), dynamically generated, is divided in several smaller ones (50-100 words). Each of them associated with one state of the dialog (higher speech recognition rate). • The system uses two different confirmation strategies, depending on the size of the vocabulary (implicit <25, explicit >25). • The user can interrupt the system anytime: barge-in.

  12. Interaction Model (Browse) SECTION <section> back Next BLOCK Previous <news> back Next SUMMARY Previous body back NEWS

  13. Interaction Model (Query) SECTION <section> back TERM <term> AND results>1 <term> AND results=1 OPTIONS back <news> Next SUMMARY Previous body back NEWS

  14. Information Model • Built extracting information from the web site of a local newspaper (El Norte de Castilla). • Decision tree: • The contents of the newspaper are divided into sections. • Each section contains several news stories. • Each news story is composed of several elements: a headline, a short summary and a body.

  15. Back Select Next Previous Next Previous Information Model (Browse) ... SECTION1 SECTION2 ... BLOCK1 B2 B3 SUMMARY1 S2 S3 S4 S5 NEWS1 N2 N3 N4 N5

  16. INDEX FOR SECTION3 INDEX FOR SECTION2 INDEX FOR SECTION1 term1 term2 term3 term4 ... term1 term2 term3 term4 ... term1 term2 term3 term4 ... Doc1 Doc2, doc3 Doc4 Doc3 ... Doc1 Doc2, doc3 Doc4 Doc3 ... Doc1 Doc2, doc3 Doc4 Doc3 ... Information Model (Query) • Inverted index: • For each news story we extract all the terms. • A stemmer is used to remove affixes. • For each term we calculate the weight using tf-idf. • We use the 25 most relevant components of each news.

  17. Information Model (Query) • Term weight: term frequency-inverse document frequency (tf-idf). The following formula is used to compute the weight w of each term in the document. • tf is the number of times the term occurs in the document. • df is the number of documents in which that term appears. • N is the number of documents in the collection: • Document collection: stories collected from the newspaper web site during more than a year (71,141 news). N w = (1 + log(tf )) log df

  18. Sample Interaction (Browse) • System: Welcome to “El Norte de Castilla” newspaper. Remember that you can interrupt the system anytime. You can ask for help whenever you need it. Choose browse or query to access information. • User: Browse. • System: What section do you want? local, Spain, international, life, culture or television. • User: International • System: International section. There are ten news. Block one. First: Left parties victory in regional French elections. Second: Tunisia cancels the summit of the Arab League about political reforms. Third: New leader of Hamas in Gaza describes George Bush as “enemy of God and Islam”. Fourth: Karzai announces a delay until September of Afghanistan elections. Fifth: United States gives the keys of the first of the twenty five Iraqi ministries. Please, choose the news you want, next or back. • User: Fourth • System: Headline: Karzai announces a delay until September of Afghanistan elections. International Section. Summary: He justifies the delay because of ...

  19. Sample Interaction (Query) • System: Welcome to “El Norte de Castilla” newspaper. Remember that you can interrupt the system anytime. You can ask for help whenever you need it. Choose browse or query to access information. • User: Query. • System: What section do you want? local, Spain, international, life, culture or television. • User: International • System: International section. Please, tell me the query term. • User: Elections • System: I understood elections. Is it correct? • User: Yes • System: International Section. There are three news about elections. First: Left parties victory in regional French elections. Second: Karzai announces a delay until September of Afghanistan elections. Third: At least 46 dead in the elections. Please, choose the news you want. • User: Second • System: Headline: Karzai announces a delay until September of Afghanistan elections. International Section. Summary: He justifies the delay because of ...

  20. Conclusions • We have presented a system which allows speech access to a newspaper web site. • Interaction Model combines browse and query mechanisms in order to allow the user access the information. • Information Model supports that interaction using two data structures: a decision tree and an inverted index. • All the contents used by the system are automatically obtained from the web. • We used VoiceXML as a language to describe dialogs.

  21. Future Work • We are working in the evaluation of the system performance and an user satisfaction. • We will study how users respond to the system and this will allow us to validate the adequacy of the models proposed to access the information.


  23. References • [Ch02] Chang, E. et. al.: A System for Spoken Query Information Retrieval on Mobile Devices. IEEE Transactions on Speech and Audio Processing. 10(8). November 2002. • [Cr99] Crestani, F.: Vocal access to a Newspaper Archive: Design Issues and Preliminary Investigations. In: ACM Digital Libraries. 1999. • [FKL01] Freire, J.; Kumar, B.; Lieuwen, D. F.: WebViews: Accessing Personalized Web Content and Services. In: International World Wide Web Conference. 2001. • [Go00] Goose, S. et. al.: Enhancing Web Accessibility Via the Vox Portal and a Web Hosted Dynamic HTML & VoxML Converter. In: International World Wide Web Conference. May 2000. • [HT95] Hemphill, C. T.; Thrift, P. R.: Surfing the Web by Voice. In: ACM International Conference on Multimedia. 1995. • [La97] Lau, R. et. al.: WebGalaxy - Integrating Spoken Language And Hypertext Navigation. In: European Conference on Speech Communication and Technology (Eurospeech). 1997.

  24. References • [La99] Lamel, L. et. al.: The Limsi Arise System For Train Travel Information. In: International Conference on Acoustic, Speech and Signal Processing (ICASSP). 1999. • [PCS03] Polifroni, J.; Chung, G.; Seneff, S.: Towards the Automatic Generation of Mixed-Initiative Dialogue Systems from Web Content. In: European Conference on Speech Communication and Technology (Eurospeech). 2003. • [SWY75] Salton, G.; Wong, A.; Yang, C. S.: A vector space model for automatic indexing. Communications of the ACM. 18(11). November 1975. • [Ve03] Vesnicer, B. et. al.: A Voice-driven Web Browser for Blind People. In: European Conference on Speech Communication and Technology (Eurospeech). 2003. • [Zu00] Zue, V. et. al.: JUPITER: A Telephone-Based Conversational Interface for Weather Information. IEEE Transactions on Speech and Audio Processing. January 2000.

More Related