510 likes | 632 Views
The Value of Shared Visual Information for Task-Oriented Collaboration. Darren Gergle Human Computer Interaction Institute Carnegie Mellon University Doctoral Thesis Proposal August 19 th, 2005. Research Domain: Task-Oriented Collaboration.
E N D
The Value of Shared Visual Information for Task-Oriented Collaboration Darren GergleHuman Computer Interaction Institute Carnegie Mellon University Doctoral Thesis Proposal August 19th, 2005
Research Domain: Task-Oriented Collaboration Multiple people perform joint actions on concrete objects • Often share a common goal • Actions must be tightly coordinated • Several coordination mechanisms available in FTF • Conversation or language • Nonverbal cues (e.g., facial expressions) • Gestures and pointing • Observation of others’ actions • Media features affect the availability and value of the mechanisms • Field of view, display resolution, update speed, etc. 2
Research Goal • To better understand how visual information and language interact to support successful task-oriented collaboration • Empirical Studies: An experimental paradigm and series of controlled studies to examine the particular features of shared visual information • Sequential Analyses: Explore the details of where shared visual information transforms the collaboration • Computational Modeling:Develop a model to enable predictions about how shared visual information supports successful communication and collaborative performance 3
Areas of Contribution • Theoretical • Extend current theories of dialogue and communication • Methodological • Methods for studying multimodal/multistream behavioral data • Practical • Findings that can be applied to the future development of systems designed to support mediated task-oriented collaboration 4
Outline • Theoretical Background • Stage 1: Experimental Paradigm & Empirical Studies • Stage 2: Sequential Discourse Analyses • Stage 3: A Rule-Based Computational Model of Referring Behavior • Benefits and Contributions • Timeline for Dissertation 5
Theory of Common Ground • Interaction and dialogue are more efficient when people share greater common ground[Clark & Marshall, 1981] • The process by which communicators accrue common ground is referred to as grounding • The media affect resources for grounding and coordination [Clark & Brennan, 1991; Kraut, Fussell & Siegel, 2003] Example of Grounding in a Repair Task Mechanic: Next you have to put the piston ring on. Student: The piston ring? Mechanic: Yeah, that piece over there. Mechanic: [Points to the piston ring] Student: Ok I got it. 6
Example Video • Video example of remote repair in the presence of shared visual information 7
What Features of the Shared Visual Information Matter? • Shared visual information has many features • Shared visual information may be more or less useful depending upon task features • Need a new experimental paradigm to systematically explore these variations 8
Stage 1: Experimental Paradigm and Empirical Studies A paradigm for exploring how features of shared visual information impact performance and communication 9
target puzzle staging area view of worker’s work area work area Experimental Task Worker’s Display Helper’s Display Task Used in 8 Studies To Date[GFK, CSCW’02] [GMKF, CHI’04] [GKF, JLSP’04] [KGF, CSCW’04] 10
Collaborative Puzzle Study Immediacy of the shared visual information • Immediate • Delayed (3 seconds) • None Color drift (stability of task objects) • Stable vs. Drift Puzzle difficulty (visual complexity) • Easy vs. Difficult Drift Stable • [details in Kraut et al., CSCW 2002; Gergle et al., JLSP 2004] 11
Participants and Experimental Design • 12 Helper / Worker pairs • Mixed model design • Color drift (between-subjects) • Immediacy of visual information (within-subjects) • Puzzle difficulty (within-subjects) • Total of 6 experimental conditions • 3 Immediacy * 2 Puzzle difficulty, counter-balanced • 4 trials in each experimental condition • 24 total puzzles 12
Task Performance • Drift: Immed < Delay < None • Stable: Immed < {Delay, None}[all p’s<.05] • Overall shared visual information improves performance • Shared visual information is more important when objects are changingor less important when words can easily describe the environment 13
Communication Efficiency • Helper: No differences • Worker: Immed < Delay < None [all p’s<.01] • Pairs use visual information for efficiency • Worker compensates for Helper’s lack of visual feedback • Results are fairly general, need to understand more about the communication content… 14
Use of Deictic Pronouns • Immed < Delay < None [all p’s<.01] • Pairs use differing rates of deictic pronouns • Pairs take advantage of the efficiencies provided by visual information 15
Puzzle Study Summary • Pairs are faster, demonstrate efficient language use, and adapt who contributes depending on the availability of shared visual information • However, these data tend to be at a very high level of aggregation • For theory development and practical use, we need a better understanding of where people use shared visual information 16
Stage 2: Sequential Discourse Analyses A methodology for examining where in the course of a collaborative task shared visual information is most useful 17
Overview of Discourse Coding Scheme • Helper Utterances: e.g., Referents, Positional Information, etc. • “Take the red one” [Helper Referent] • Worker Utterances: e.g., Referents, Positional Info, etc. • “OK, did it.” [Worker Acknowledgement] • Worker Actions: Move, Remove, Position • Worker brings piece into workspace [Worker Move Xi] • Joint Worker Utterances + Actions: Acknowledge + Move, Acknowledge + Position • “OK, got it” + Worker moves piece [Worker Ack + Move Xi] • 14 total codes, high inter-rater reliability [ ≈ .86] 19
Example Participant Clips • Sample clips of pair in easy, stable conditions • SVS • Worker says very little • Helper uses visual info for confirmation • Errors detected with less ambiguous visual info • No SVS • Worker becomes much more explicit • Errors detected by more ambiguous language 20
Example Data Streams SVS No SVS 21
Investigating Sequential Structure • [statistical details in Gergle et al., CSCW 2004] 22
Action as Language with SVS Refer topiece Verifyreferent Describeposition Verifyposition Finish or repeat 25
General Findings • Shared visual information facilitates performance, leads to communication efficiencies, and impacts communication processes • Shared visual information and language are tightly integrated and need to be studied together • So far, the results have been primarily descriptive. We need a method for generalizing these findings to a wider number of situations 26
Stage 3:A Rule-Based Computational Model of Referring Behavior • Develop a predictive model to describe how shared visual context interacts with existing linguistic context to impact efficient referring behaviors during task-oriented collaboration 27
Pairs Use Visual Information to Support the use of Efficient Referring Expressions (REs) • Pairs used compact expressions to refer to objects when shared visual information was available • …language-based discourse models provide more detail, but do not capture the impact of the visual context 28
Language-Based Discourse Models Cannot Account for Many Patterns of REs Observed • Pairs use “this,” “that,” “it,” when they shouldn’t • Pairs use full NPs when pronouns are licensed • Pairs exhibit strategies that appear inappropriate, e.g., pronoun+NP, “is it this […] orange one” Both linguistic context and visual context need to be accounted for… 29
Both Linguistic and Visual Context Provide License for REs • … • Helper: alright, take the dark orange block • Worker: OK • Worker: [moved incorrect piece] • Helper: Oh, that’s not it • … 30
Integrated Model of Referring Behavior • Modeling of language alone is insufficient Need a model of referring behavior that accounts for shared visual information • I propose to develop an integrated model of reference resolution that accounts for both linguistic and visual information 31
Linguistic Information Only H: Alright, take the [dark orange block] W: [Moves incorrect piece] H: Oh, [that]’s not [it] Transient Knowledge Base (TKB) Maintains ranked list ofaccessible entities { [dark orange], … } { [that]:[dark orange], … } { … } 33
Integrated Model H: Alright, take the [dark orange block] W: [Moves incorrect piece] H: Oh, [that]’s not [it] Transient Knowledge Base (TKB) Maintains ranked list ofaccessible entities { [dark orange], … } { [that]:[red orange], [it]:[dark orange], … } { [red orange],[dark orange], … } { … } 34
A Central Challenge • How do linguistic and visual salience combine in a model of collaborative referring behavior? • How do various forms of available shared visual information moderate the ranking (e.g., temporal delays)? • Three hypothesized solutions… 35
Proposed Work Plan • Prepare corpus • Develop architecture • Generate model predictions • Develop methodology for model assessment 40
Features For Annotation / Ranking Functions • Language Features [Tetreault, 2005; GJW, 1986] • Grammatical role hierarchy (S > DO > IDO …) • Information status (e.g., given/new) • Recency of mention Visual Features [Byron et al., 2005; Chai, 2004] • Object in view for Helper / Worker • Time in view, time since last in view • Object currently active (e.g., selected) • Time since last active (i.e., recency) • Uniqueness (foreground/background, pop-out) 43
Model Development & Validation Plan • Develop ranking functions • Use existing linguistic ranking functions for language model • Modify and augment existing visual salience functions • Develop method for integrating the two • Use subset of tagged puzzle corpus for development • 5-6 development iterations • Holdout tagged portion for validation • Once model developed, evaluate on large remaining portion of the tagged corpus 44
Model Evaluation & Testing Plan Main Measure: Proportion of referring expressions correctly resolved 45
Theoretical Contributions • Empirical studies • Extend our understanding of how grounding works • Details how particular features of the technologies interact with task features to impact collaborative performance • Sequential discourse analyses • Provide detailed insight communication processes and where shared visual information is most useful • Computational modeling • Allows us to extend these findings and generalize to a wider number of situations • Model provides a method for predicting the magnitude of the impact of shared visual information • Allows us to compute the degree to which visual information may help resolve ambiguity or reference in new situations 47
Methodological Contributions • A multidisciplinary research approach • Demonstrates a unique combination of techniques from behavioral research, discourse analysis, and computational linguistics • Produces complimentary findings and demonstrates understanding at many levels of granularity for a given a design space • A rigorous experimental paradigm • Decompose elements of shared visual information and examine impact on collaborative performance • Used by several researchers outside the lab • Methods for multimodal discourse segmentation • For example, how to integrate discrete linguistic information with more continuous visual information 48
Practical Contributions • Rationale for design decisions • Often times spoken language is sufficient • Other times visual information is needed • Provide insight into design tradeoffs (e.g., rationale for screen allocation and layout decisions) • Insight into when particular pieces of visual information need to be provided to remote collaborators • An integrated module for reference resolution that can be integrated with existing systems • Provide a predictive environment for examining the value of shared visual information in novel situations • Provide more natural human-computer/agent/robot interactions 49
Timeline 50