230 likes | 395 Views
LINGTOUR: a PDA for tourists. Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc | lin | chollet@enst.fr Catherine Pelachaud IUT de Montreuil - Université Paris 8 140, rue de la Nouvelle France 93100 Montreuil, France
LINGTOUR:a PDA for tourists Alain Goyé, Eric Lecolinet, Mutsuko Tomokiyo, Gérard Chollet GET-ENST 46, rue Barrault 75634 Paris Cedex 13 goye | elc | lin | chollet@enst.fr Catherine Pelachaud IUT de Montreuil - Université Paris 8 140, rue de la Nouvelle France 93100 Montreuil, France c.pelachaud@iut.univ-paris8.fr Ding Xiaoqing, Mao Yuhang Dept. of Electronic Engineering Tsinghua University Beijing, 100084, China dingxq@tsinghua.edu.cn Ni Yang Institut National des Télécommunications Département Electronique et Physique 9,Rue Charles Fourier 91011 Evry Cedex-France yang.ni@int-evry.fr
LINGTOUR: an history • Collaboration with TsingHua University : • Memorandum of understanding (2000) • Vocal French-Chinese dictionary with Le Robert • Master thesis of Dong Qingfu: « Realization of Intelligent Camera Capable of Character Recognition and Translation » Interfaces multimodales pour un assistant au voyage
The LINGTOUR project • Multilingual management of information • Initially, a PDA for travellers : • Virtual guide : access to multilingual information for tourists (practical and cultural) • Communication assistant: translation help, navigation within lexicon and access to typical conversations • Travel assistant : orientation and environment interpretation using local and positioning information • A personal assistant (PDA or smartphone) with multimodal and ergonomic capabilities : • inputs (text, speech, stylus, images) • outputs (text, speech, images, video) Interfaces multimodales pour un assistant au voyage
Tsinghua University Interactions PDA - server Images, sound Supervision Rafinement / corrections of the image Character recognition, Vocal recognition Multilingal translation, Speech synthesis Selection / extraction of text Images, sound, text Multimodal navigation in maps and lexicon Sound taking Interfaces multimodales pour un assistant au voyage
Exploit the specificities of PDA • One makes an optimal exploitation of possibilities of PDA for the multimodality : • Use, jointly, without any keyboard, input of the tactile screen, microphone and camera, and • Exploit alternatively or simultaneously the graphic qnd sound possibilities, according to the context, to represent the information. • The PDA is connected as each time as possible to Internet: • to download actuality informations • to enable to export the tasks on a remote server: • too complicated • Or too high cost for memory • To enable the intervention, if necessary, of a human operater Interfaces multimodales pour un assistant au voyage
3 types of multimodal interface • Gesture and voice : Combinaition ofControl menus + vocal input • Controling zoomable interfaces towards graphic or text inputs • Intelligent Camera : Rafinement of images • Based on the correlation of a series of images • to improve character recognition • Cultural agents : Conversational agents animated and adapted to the culture • Adding tospeech non-verbal behaviour: face, eyes, gestures, depending to the culture Interfaces multimodales pour un assistant au voyage
Gesture and voice ZUIs and Menu control 2D • Constraints ofPDA : screen size • ZUIs : user zoomable interfaces • Concept of semantic zoom: Progressive revelation of levels of details • Menus control [1] : • Selection + control of the action (movement, zoom) by only one gesture • No chang of context, no manipulation of multiple interactions for only one operation [1]Pook, S., Lecolinet, E., Vaysseix, G. et Barillot, E., Control Menus: Execution and Control in a Single Interactor. Proc. ACM conf. on Human Factors in Computing Systems (CHI) 2000, 263-264. ACM Press. Interfaces multimodales pour un assistant au voyage
Gesture and voice Characteristics of menu control • Combinning the selection and the control of an operation for only one gesture • Capable to integrate up to 2 bars of movements (vertical et horizontal) • The user concentrates his attention on the content • Capable to have sub-menus • Like the Pie menus [2] and the Marking menus [3], offering a beginner mode et an expert mode • The spacious disposition of the menus helps the memorization • Quick gestures => the menus don’t appear on the screen • Implicit passage from a mode to the other • [2] Hopkins, D., The design and implementation of Pie menus. Dr Dobb's journal of software tools, 1991, 16 (12), 16-26. • [3] Kurtenbach, G. et al., The Hotbox: efficient access to a large number of menu-items. Proc. ACM – CHI, 1993, 231-327. Interfaces multimodales pour un assistant au voyage
Gesture and voice Application of the menu control • navigation in a map of town, • navigation by a lexicon : • Helpful words and clauses to tourists, • hierarchized in categories such as : accomodation > hotel > reservation…. Interfaces multimodales pour un assistant au voyage
common acoustique models Gesture and voice The voice : multilingal recognition • voice recognition engine: • Limited vocabulary, but • independant of speaker, • No leaning. • The recognition in different langages : • sharing commonacoustique models, one which facilitates the future extensions to new languages. • Adaptable models to users and to usage conditions. Chinese French Models specific to the langage Interfaces multimodales pour un assistant au voyage
Gesture and voice The voice is associated with gestures… The vocal information is emploied differently according to the given context : • Navigation in the map : « tap and talk » : access by a vocal menu to diverse informations on the pointed objet. • Navigation by lexicon : • like short cut access to categories, then • to the access to input words or clauses. The translation will appear / be synthesized in the target language. • Possibly, improvement by using keywords ("word spotting"). Description ? Horaires ? Tarifs ? Accès ? Interfaces multimodales pour un assistant au voyage
Intelligent camera The « intelligent » camera • see, recognize and translate The character recognition – chinese in paticular – achieved now to high performance. • to limit computing cost : • Recognition made on a sub-part of the image. • This sub-part can be chosen semi-automatically at the moment of delimitation phase and previous segmentation. • The text once recognized can be translated: • Locally • to facilitate the translation, a vocal menu enables to choose the context : the notice of bus stops or street names, monuments, etc. • Or by a remote server via a radiocommunication service. • It’s also possible to be reproduced by vocal synthesis Interfaces multimodales pour un assistant au voyage
capture reco translation Intelligent camera The camerausage[4] [4]Mao, Y., Dong, Q., Qi Y. et Chollet, G. Realization of an Intelligent Camera capable of Character Recognition and Translation. Proc. of Sino-French Symp. on Speech and Language Processing, Beijing, October 2000. Disponible à l’adresse : http://www.tsi.enst.fr/~chollet/Projets/Chine/Lingtour/IntelCamera.doc Interfaces multimodales pour un assistant au voyage
Intelligent camera Improve the image resolution • Difficulty : • image farobtained in the street • Cheeper camera • quality/ insufficientresolution for the recognition Solution : image rafinement • correlation and reconstruction of a series of successive images. • Exploitation of the small differences due to natural movement of the hand which keeps the camera. • image with superieur resolution to one of captures. Interfaces multimodales pour un assistant au voyage
Intelligent camera Principle of image rafinement Camera on the PDA Evaluation of movements (sub-pixel) Vibration of the hand Image of better resolution Acquisition of image sequence Recomposition of only one image Interfaces multimodales pour un assistant au voyage
Intelligent camera Rafinement of images : results Notable improvement : • Of visualquality • of rate of character • recognition Interfaces multimodales pour un assistant au voyage
Cultural agents Conversational agents : interest • It enables to[5] tarnsfer an information in more attractive and more user-friendly manner than simple vocal synthesis. • The nonverbal expressions enable : • to disambiguate the meaning of an utterance, • to emphasize certain words or utterance fragments… • It supplies the informations with different levels: • syntactic • semantic • emotionnal • In a multicultural context, a visualdemonstration can be also better vecter of teaching of certain usages. [5]Pelachaud, C., Carofiglio, V., De Carolis, B. et de Rosis, F., Embodied Contextual Agent in Information Delivering Application, First Intl. Joint Conf. on Autonomous Agents & Multi-Agent Systems, Bologna, July 2002 Interfaces multimodales pour un assistant au voyage
Cultural agents « Greta » : facial animation engine • Objective : a model animated capable to simulate in quick and realistic manner the dynamic aspects of human face. • Realization : a facial animation engine of which the model 3D forms a young woman behaviour. • Greta is : • the core of a decoder MPEG-4 • Conform to specifications “Simple Facial Animation Object Profile" of the standard. • capable : • to generate the structure of an original model, • To animate this, • To reproduct in real time. Interfaces multimodales pour un assistant au voyage
Cultural agents Adopt the conversational agents • Transport on PDA of animated agents. • The power and the screen size of apparatus are limited • The complexity and the level of details of the animation have to be adapted. • Adaptation of the behaviour to users : In spite of recent advance in material of realism, the actual agents know only one type of behaviour, which reflects often the occidental culture. • Cultural and social adaptation to the context : The same information must be delivered differently, for example: • to a French and to a Chinese, • to a journalist and to a private. Interfaces multimodales pour un assistant au voyage
Conversational and cultural agents : semantic representation • Base : semantic representation independant on the language, based on the standard XML-XSD. • description of the communicative fonction of gestures and signals composing the gestures. • On-layer of the attributes specific to the culture, which influence on : • the choice of a gesture (smile or shake/nod of the head), • the duration of a look… More generally, these influences can concern : • the definition of a signal (hidingof a signal by an other), • Intensity of sound, • Sound duration, etc. Interfaces multimodales pour un assistant au voyage
Cultural agents Conversational and cultural agents … in certain cultures, Not to watch his interlocuter can be interpreted as a lack of his attention /his interest… In other cultures, Watch straightforward in eyes can be interpreted as a form of agression… Interfaces multimodales pour un assistant au voyage
Gesture and voice Intelligent camera Cultural agents Results and what follows… At the end of the works which this project has enabled to initiate, we hope be in a position to demonstrate : • 1) the possibility to integrate on a mobile terminal (PDA, smartphone…) using the diverse interfaces presented here : • Menu control 2D, • capture and recognition of text, • Conversational agents. • 2) the profits of the improvements which we recommend for each of these fonctionnalities: • integration of vocal commands in the menus, • rafinement of images by spatio-temporary correlation, • enrichment of the agents by the culturalattributes. Interfaces multimodales pour un assistant au voyage
To evaluate these works within the EURO-CHINA programme … • Collaboration engaged with Peer2Phone (voice on IP via WIFI) • Presentation at the end of April in Beijing • A proposal with our Chinese partnars for the Olympics in Beijing Interfaces multimodales pour un assistant au voyage