470 likes | 479 Views
Explore SmartKom's multimodal interface and semantic web tech for personalized, intuitive interactions. Learn about modality fusion and situated delegation-oriented dialogue paradigm.
E N D
SmartKom: Modality Fusion for a Mobile Companion based on Semantic Web Technologies Cyber Assist Consortium Second International Symposium - Information Environment for Mobile and Ubiquitous Computing Era - Tokyo, 25 March 2003 Wolfgang Wahlster German Research Center for Artificial Intelligence DFKI GmbH Stuhlsatzenhausweg 3 66123 Saarbruecken, Germany phone: (+49 681) 302-5252/4162 fax: (+49 681) 302-5341 e-mail: wahlster@dfki.de WWW:http://www.dfki.de/~wahlster
Intelligent Interaction with Mobile Internet Services Localization Access to edutainment and infotainment services Access to web content and web services anywhere and anytime Access to all messages (voice, email, multimedia, MMS) from any single device Access to corporate networks and virtual private networks from any device Personalization Multimodal UMTS Systems
SmartKom: A Highly Portable Multimodal Dialogue System SmartKom-Mobile SmartKom-Public SmartKom-Home/Office Application Layer MM Dialogue Back- Bone Public: Cinema, Phone, Fax, Mail, Biometrics Mobile: Car and Pedestrian Navigation Home: Consumer Electronics EPG
A Demonstration of SmartKom’s MultimodalInterface for the Federal President of Germany Dr. Rau
SmartKom`s SDDP Interaction Metaphor Webservices Service 1 Personalized Interaction Agent User specifies goal delegates task Service 2 cooperate on problems asks questions presents results Service 3 SDDP = Situated Delegation-oriented Dialogue Paradigm Anthropomorphic Interface = Dialogue Partner See: Wahlster et al. 2001 , Eurospeech
SmartKom‘s Use of Semantic Web Technology M3L high Content XML medium Structure HTML low Layout Three Layers of Annotations Personalized Presentation
SmartKom: Intuitive Multimodal Interaction Project Budget: € 25.5 million, funded by BMBF (Dr. Reuse) and industry Project Duration: 4 years (September 1999 – September 2003) The SmartKom Consortium: Main Contractor Scientific Director W. Wahlster DFKI Saarbrücken MediaInterface Saarbrücken Berkeley Dresden European Media Lab Uinv. Of Munich Univ. of Stuttgart Heidelberg Univ. of Erlangen Munich Stuttgart Ulm Aachen
Outline of the Talk • The Markup Language Layer Model of SmartKom • Modality Fusion in SmartKom • The Role of the Semantic Web Language M3L • Providing Coherence in Multimodal Dialogs by Ontology-based Overlay • 5. Conclusions
Mapping Web Content Onto a Variety of Structures and Layouts Personalization M3L Content XML2 XMLn XML1 Structure Layout HTML1m HTML21 HTML2o HTML31 HTML3p HTML11 From the “one-size fits-all“ approach of static webpages to the “perfect personal fit“ approach of adaptive webpages
The Markup Language Layer Model of SmartKom MultiModalMarkupLanguage M3L OntologyInferenceLayer OIL eXtendedMarkupLanguage Schema ResourceDescriptionFramework Schema XMLS RDFS ResourceDescriptionFramework eXtendedMarkupLanguage XML RDF HypertextMarkupLanguage HTML
SmartKom: Merging Various User Interface Paradigms Graphical User interfaces Gestural Interaction Spoken Dialogue Facial Expressions Biometrics Multimodal Interaction
Multimodal Input and Output in SmartKomFusion and Fission of Multiple Modalities Input by the User Output by the Presentation agent + + Speech + + Gesture Facial Expressions + +
Symbolic and Subsymbolic Fusion of Multiple Modes Facial Expression Recognition Speech Recognition Prosody Recognition Gesture Recognition Lip Reading Subsymbolic Fusion Symbolic Fusion - Graph Unification - Bayesian Networks - Neuronal Networks - Hidden Markov Models Reference Resolution and Disambiguation Modality-Free Semantic Representation
Personalized Interaction with WebTVs via SmartKom (DFKI with Sony, Philips, Siemens) Example: Multimodal Access to Electronic Program Guides for TV User: Switch on the TV. Smartakus: Okay, the TV is on. User: Which channels are presenting the latest news right now? Smartakus: CNN and NTV are presenting news. User: Please record this news channel on a videotape. Smartakus: Okay, the VCR is now recording the selected program.
Using Facial Expression Recognition forAffective Personalization (1) Smartakus: Here you see the CNN program for tonight. (2) User: That’s great. • Smartakus: I’ll show you the program of another channel for tonight. • (2’) User: That’s great. Processing ironic or sarcastic comments (3’) Smartakus: Which of these features do you want to see?
The SmartKom Demonstrator System Multimodal Control of TV-Set Camera for Gestural Input Multimodal Control of VCR/DVD Player Microphone Camera for Facial Analysis
Unification of Scored Hypothesis Graphs for Modality Fusion in SmartKom Clause and Sentence Boundaries with Prosodic Scores Scored Hypotheses about the User‘s Emotional State Gesture Hypothesis Graph with Scores of Potential Reference Objects Word Hypothesis Graph with Acoustic Scores Modality Fusion Mutual Disambiguation Reduction of Uncertainty Intention Hypotheses Graph Intention Recognizer Selection of Most Likely Interpretation
SmartKom‘s Computational Mechanisms for Modality Fusion and Fission Modality Fission Modality Fusion Planning Unification Overlay Operations Constraint Propagation M3L: Modality-Free Semantic Representation Ontological Inferences
The Role of the Semantic Web Language M3L l M3L (Multimodal Markup Language) defines the data exchange formats used for communication between all modules of SmartKom l M3L is partioned into 40 XML schema definitions covering SmartKom‘s discourse domains l The XML schema event.xsd captures the semantic representation of concepts and processes in SmartKom‘s multimodal dialogs
OIL2XSD: Using XSLT Stylesheets to Convert an OIL Ontology to an XML Schema
Using Ontologies to Extract Information from the Web Film.de-Movie MyOnto-Movie :o-title :title :description :title :description :director Kinopolis.de-Movie :actors :critics :name MyOnto-Person :main actor :name :birthday Mapping of Metadata
M3L as a Meaning Representation Language for the User‘s Input I would like to send an email to Koiti. <domainObject> <sendTelecommunicationProcess> <sender>....................</sender> <receiver>..............</receiver> <document>..........</document> <email>...........</email> </sendTelecommunicationProcess> </domainObject>
Exploiting Ontological Knowledge to Understand and Answer the User‘s Queries <domainObject> <epg> <broadcastDefault> <avMedium> <actors> <actor><name>Schwarzenegger/name></actor> </actors> </avMedium> <channel><name>Pro7</name></channel> </broadcastDefault> </epg> </domainObject> <beginTime> <time> <function> <at> 2002-05-10T10:25:46 </at> </function> </beginTime> Which movies with Schwarzenegger are shown on the Pro7 channel?
SmartKom’s Multimodal Dialogue Back-Bone Communication Blackboards Data Flow Context Dependencies Analyzers • Speech • Gestures • Facial Expressions • Speech • Graphics • Gestures Generators Dialogue Manager Modality Fusion Discourse Modeling Action Planning Modality Fission External Services
Smartkom‘s Three-Tiered Discourse Model Domain Layer DomainObject2 DomainObject1 Discourse Layer DO2 DO10 DO11 DO12 DO1 DO3 DO9 . . . Modality Layer VO1 LO4 LO5 LO6 LO2 GO1 LO3 . . . . . . reserve ticket first list heidelberg System: This [] is a list of films showing in Heidelberg. User: Please reserve a ticket for the first one. DO = Discourse Object, LO = Linguistic ObjectGO = Gestural Object, VO = Visual Object cf. M. Löckelt et. al. 2002, N. Pfleger 2002
SmartKom’s Domain Model based on M3L Used for communication in the back-bone Frame-based ontology; representation as Typed Feature Structures in M3L (XML) CinemaReservation theater: MovieTheater movie: Movie reservationNumber: PositiveInteger address: Address seats: SeatStructure… name: String director: Person cast: PersonList yearOfProduction: PositiveInteger… firstName: String lastName: String… • Application objects composed of subobjects • Slots: Feature paths meaningful for the dialogue (entities that can be talked about / referenced to); e.g. movie:director:lastName in a CinemaReservation object • Slots can recursively contain other slots
Overlay Operations Using the Discourse Model Augmentation and Validation compare with a number of previous discourse states: fill in consistent information compute a score for each hypothesis - background pair: Overlay (covering, background) Intention Hypothesis Lattice Covering: Background: Selected Augmented Hypothesis Sequence
The Overlay Operation Versus the Unification Operation Nonmonotonic and noncommutative unification-like operation Inherit (non-conflicting) background information two sources of conflicts: conflicting atomic values overwrite background (old) with covering (new) type clash assimilatebackground to the type of covering; recursion Unification Overlay cf. J. Alexandersson, T. Becker 2001
Example for Overlay User: "What films are on TV tonight?" System: [presents list of films] User: "That‘s a boring program, I‘d rather go to the movies." How do we inherit “tonight” ?
Domain Model: A Type Hierarchy of TFS A named entertainment at some time A named TV program at some time on some channel A named Movie at some time at some cinema
Unification Simulation Films on TV tonight Fail – type clash
Overlay Simulation Films on TV tonight Assimilation Background Go to the movies Covering
"Formal" Definition Overlay Let co be covering bg be background Step 1: Assimilate(co,bg) T • Step 2: • Overlay(co,assimilate(co,bg)) • If co and bg are frames: recursion • If co is empty: use bg • If bg is empty: use co • If conflict: use co bg co
Domain Models with Multiple Inheritance Assimilate(co,bg) Compute the set of minimal upper bounds (MUB) Specialize the MUBs Unify the specialized MUBs T MUB MUB co bg • Overlay remains untouched
Overlay - Scoring • Four fundamental scoring parameters: • Number of features from Covering (co) • Number of features from Background (bg) • Number of type clashes (tc) • Number of conflicting atomic values (cv) Codomain [-1,1] Higher score indicates better fit (1 overlay(c,b) unify(c,b))
Example: Enrichment and Validation Analysis of U4: U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context
Example: Enrichment and Validation Overlay ( U6, U4) Analysis of U6: Result: (Score: 0.8666) U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context
Example: Enrichment and Validation Overlay ( U6, U2) Analysis of U6: Result: (Score: -1) U4: What’s on TV tonight? S5: [Displays a list of films] Here you see a list of films running tonight. U6: That seems not very interesting, show me the cinema program. Discourse context
Animation of Scoring Parameters Background Covering • Number of features from Covering (co) • Number of features from Background (bg) • Number of type clashes (tc) • Number of conflicting atomic values (cv) • Result: 2 12 1 0
M3L Specification of a Presentation Task <presentationTask> <subTask> <presentationGoal> <inform> ... </inform> <abstractPresentationContent> ... <result> <broadcast id="bc1"> <channel> <name>EuroSport</name> </channel> <beginTime> <time> <at>2000-12-05T14:00:00</at> </time> </beginTime> <endTime> <time> <at>2000-12-05T15:00:00</at> </time> </endTime> <avMedium> <title>Sport News</title> <avType>sport</avType> ... </abstractPresentationContent> <interactionMode>leanForward</interactionMode> <goalID>APGOAL3000</goalID> <source>generatorAction</source> <realizationType>GraphicsAndSpeech</realizationType>
SmartKom‘s Presentation Planner The Presentation Planner generates aPresentation Plan by applying a set of Presentation Strategies to the Presentation Goal. GlobalPresent Present AddSmartakus .... DoLayout EvaluatePersonaNode ... PersonaAction ... Inform ... Speak SendScreenCommand Smartakus Actions TryToPresentTVOverview ShowTVOverview ShowTVOverview SetLayoutData ... SetLayoutData Generation of Layout ShowTVOverview GenerateText SetLayoutData ... SetLayoutData cf. J. Müller, P. Poller, V. Tschernomas 2002
Salient Characteristics of SmartKom • Seamless integration and mutual disambiguation of multimodal input and output on semantic and pragmatic levels • Situated understanding of possibly imprecise, ambiguous, or incom- plete multimodal input • Context-sensitive interpretation of dialog interaction on the basis of dynamic discourse and context models • Adaptive generation of coordinated, cohesive and coherent multimodal presentations • Semi- or fully automatic completion of user-delegated tasks through the integration of information services • Intuitive personification of the system through a presentation agent
Conclusions • Various types of unification, overlay, constraint processing, planning and ontological inferences are the fundamental processes involved in SmartKom‘s modality fusion and fission components. • The key function of modality fusion is the reduction of the overall uncertainty and the mutual disambiguation of the various analysis results based on a three-tiered representation of multimodal discourse. • We have shown that a multimodal dialogue sytsem must not only understand and represent the user‘s input, but its own multimodal output.
First International Conference on Perceptive &Multimodal User Interfaces (PMUI’03) November 5-7th, 2003 Delta Pinnacle Hotel, Vancouver, B.C., Canada Conference Chair Sharon Oviatt, Oregon Health & Science Univ., USA Program Chairs Wolfgang Wahlster, DFKI, Germany Mark Maybury, MITRE, USA PMUI’03 is sponsored by ACM, and will be co-located in Vancouver with ACM’s UIST’03. This meeting follows three successful Perceptive User Interface Workshops (with PUI’01 held in Florida) and three International Multimodal Interface Conferences initiated in Asia (with ICMI’02 held in Pittsburgh).
March 2003ISBN 0-262-06232-18 x 9, 392 pp., 98 illus.$40.00/£26.95 (CLOTH) Edited by Dieter Fensel, James A. Hendler, Henry Lieberman and Wolfgang WahlsterForeword by Tim Berners-Lee