1 / 51

The Value of Shared Visual Information for Task-Oriented Collaboration

The Value of Shared Visual Information for Task-Oriented Collaboration. Darren Gergle Human Computer Interaction Institute Carnegie Mellon University Doctoral Thesis Proposal August 19 th, 2005. Research Domain: Task-Oriented Collaboration.

natan
Download Presentation

The Value of Shared Visual Information for Task-Oriented Collaboration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Value of Shared Visual Information for Task-Oriented Collaboration Darren GergleHuman Computer Interaction Institute Carnegie Mellon University Doctoral Thesis Proposal August 19th, 2005

  2. Research Domain: Task-Oriented Collaboration Multiple people perform joint actions on concrete objects • Often share a common goal • Actions must be tightly coordinated • Several coordination mechanisms available in FTF • Conversation or language • Nonverbal cues (e.g., facial expressions) • Gestures and pointing • Observation of others’ actions • Media features affect the availability and value of the mechanisms • Field of view, display resolution, update speed, etc. 2

  3. Research Goal • To better understand how visual information and language interact to support successful task-oriented collaboration • Empirical Studies: An experimental paradigm and series of controlled studies to examine the particular features of shared visual information • Sequential Analyses: Explore the details of where shared visual information transforms the collaboration • Computational Modeling:Develop a model to enable predictions about how shared visual information supports successful communication and collaborative performance 3

  4. Areas of Contribution • Theoretical • Extend current theories of dialogue and communication • Methodological • Methods for studying multimodal/multistream behavioral data • Practical • Findings that can be applied to the future development of systems designed to support mediated task-oriented collaboration 4

  5. Outline • Theoretical Background • Stage 1: Experimental Paradigm & Empirical Studies • Stage 2: Sequential Discourse Analyses • Stage 3: A Rule-Based Computational Model of Referring Behavior • Benefits and Contributions • Timeline for Dissertation 5

  6. Theory of Common Ground • Interaction and dialogue are more efficient when people share greater common ground[Clark & Marshall, 1981] • The process by which communicators accrue common ground is referred to as grounding • The media affect resources for grounding and coordination [Clark & Brennan, 1991; Kraut, Fussell & Siegel, 2003] Example of Grounding in a Repair Task Mechanic: Next you have to put the piston ring on. Student: The piston ring? Mechanic: Yeah, that piece over there. Mechanic: [Points to the piston ring] Student: Ok I got it. 6

  7. Example Video • Video example of remote repair in the presence of shared visual information 7

  8. What Features of the Shared Visual Information Matter? • Shared visual information has many features • Shared visual information may be more or less useful depending upon task features • Need a new experimental paradigm to systematically explore these variations 8

  9. Stage 1: Experimental Paradigm and Empirical Studies A paradigm for exploring how features of shared visual information impact performance and communication 9

  10. target puzzle staging area view of worker’s work area work area Experimental Task Worker’s Display Helper’s Display Task Used in 8 Studies To Date[GFK, CSCW’02] [GMKF, CHI’04] [GKF, JLSP’04] [KGF, CSCW’04] 10

  11. Collaborative Puzzle Study Immediacy of the shared visual information • Immediate • Delayed (3 seconds) • None Color drift (stability of task objects) • Stable vs. Drift Puzzle difficulty (visual complexity) • Easy vs. Difficult Drift Stable • [details in Kraut et al., CSCW 2002; Gergle et al., JLSP 2004] 11

  12. Participants and Experimental Design • 12 Helper / Worker pairs • Mixed model design • Color drift (between-subjects) • Immediacy of visual information (within-subjects) • Puzzle difficulty (within-subjects) • Total of 6 experimental conditions • 3 Immediacy * 2 Puzzle difficulty, counter-balanced • 4 trials in each experimental condition • 24 total puzzles 12

  13. Task Performance • Drift: Immed < Delay < None • Stable: Immed < {Delay, None}[all p’s<.05] • Overall shared visual information improves performance • Shared visual information is more important when objects are changingor less important when words can easily describe the environment 13

  14. Communication Efficiency • Helper: No differences • Worker: Immed < Delay < None [all p’s<.01] • Pairs use visual information for efficiency • Worker compensates for Helper’s lack of visual feedback • Results are fairly general, need to understand more about the communication content… 14

  15. Use of Deictic Pronouns • Immed < Delay < None [all p’s<.01] • Pairs use differing rates of deictic pronouns • Pairs take advantage of the efficiencies provided by visual information 15

  16. Puzzle Study Summary • Pairs are faster, demonstrate efficient language use, and adapt who contributes depending on the availability of shared visual information • However, these data tend to be at a very high level of aggregation • For theory development and practical use, we need a better understanding of where people use shared visual information 16

  17. Stage 2: Sequential Discourse Analyses A methodology for examining where in the course of a collaborative task shared visual information is most useful 17

  18. Where is Shared Visual Information Valuable? 18

  19. Overview of Discourse Coding Scheme • Helper Utterances: e.g., Referents, Positional Information, etc. • “Take the red one” [Helper Referent] • Worker Utterances: e.g., Referents, Positional Info, etc. • “OK, did it.” [Worker Acknowledgement] • Worker Actions: Move, Remove, Position • Worker brings piece into workspace [Worker Move Xi] • Joint Worker Utterances + Actions: Acknowledge + Move, Acknowledge + Position • “OK, got it” + Worker moves piece [Worker Ack + Move Xi] • 14 total codes, high inter-rater reliability [ ≈ .86] 19

  20. Example Participant Clips • Sample clips of pair in easy, stable conditions • SVS • Worker says very little • Helper uses visual info for confirmation • Errors detected with less ambiguous visual info • No SVS • Worker becomes much more explicit • Errors detected by more ambiguous language 20

  21. Example Data Streams SVS No SVS 21

  22. Investigating Sequential Structure • [statistical details in Gergle et al., CSCW 2004] 22

  23. Object Reference Task 23

  24. Object Placement Task 24

  25. Action as Language with SVS Refer topiece Verifyreferent Describeposition Verifyposition Finish or repeat 25

  26. General Findings • Shared visual information facilitates performance, leads to communication efficiencies, and impacts communication processes • Shared visual information and language are tightly integrated and need to be studied together • So far, the results have been primarily descriptive. We need a method for generalizing these findings to a wider number of situations 26

  27. Stage 3:A Rule-Based Computational Model of Referring Behavior • Develop a predictive model to describe how shared visual context interacts with existing linguistic context to impact efficient referring behaviors during task-oriented collaboration 27

  28. Pairs Use Visual Information to Support the use of Efficient Referring Expressions (REs) • Pairs used compact expressions to refer to objects when shared visual information was available • …language-based discourse models provide more detail, but do not capture the impact of the visual context 28

  29. Language-Based Discourse Models Cannot Account for Many Patterns of REs Observed • Pairs use “this,” “that,” “it,” when they shouldn’t • Pairs use full NPs when pronouns are licensed • Pairs exhibit strategies that appear inappropriate, e.g., pronoun+NP, “is it this […] orange one” Both linguistic context and visual context need to be accounted for… 29

  30. Both Linguistic and Visual Context Provide License for REs • … • Helper: alright, take the dark orange block • Worker: OK • Worker: [moved incorrect piece] • Helper: Oh, that’s not it • … 30

  31. Integrated Model of Referring Behavior • Modeling of language alone is insufficient Need a model of referring behavior that accounts for shared visual information • I propose to develop an integrated model of reference resolution that accounts for both linguistic and visual information 31

  32. Proposed Modeling Framework 32

  33. Linguistic Information Only H: Alright, take the [dark orange block] W: [Moves incorrect piece] H: Oh, [that]’s not [it] Transient Knowledge Base (TKB) Maintains ranked list ofaccessible entities { [dark orange], … } { [that]:[dark orange], … } { … } 33

  34. Integrated Model H: Alright, take the [dark orange block] W: [Moves incorrect piece] H: Oh, [that]’s not [it] Transient Knowledge Base (TKB) Maintains ranked list ofaccessible entities { [dark orange], … } { [that]:[red orange], [it]:[dark orange], … } { [red orange],[dark orange], … } { … } 34

  35. A Central Challenge • How do linguistic and visual salience combine in a model of collaborative referring behavior? • How do various forms of available shared visual information moderate the ranking (e.g., temporal delays)? • Three hypothesized solutions… 35

  36. Purely Linguistic Context 36

  37. Purely Visual Context 37

  38. Integrated Approach 38

  39. Three Hypothesized Ranking Strategies 39

  40. Proposed Work Plan • Prepare corpus • Develop architecture • Generate model predictions • Develop methodology for model assessment 40

  41. The Puzzle Corpus 41

  42. Sample Distributions of RE Forms 42

  43. Features For Annotation / Ranking Functions • Language Features [Tetreault, 2005; GJW, 1986] • Grammatical role hierarchy (S > DO > IDO …) • Information status (e.g., given/new) • Recency of mention Visual Features [Byron et al., 2005; Chai, 2004] • Object in view for Helper / Worker • Time in view, time since last in view • Object currently active (e.g., selected) • Time since last active (i.e., recency) • Uniqueness (foreground/background, pop-out) 43

  44. Model Development & Validation Plan • Develop ranking functions • Use existing linguistic ranking functions for language model • Modify and augment existing visual salience functions • Develop method for integrating the two • Use subset of tagged puzzle corpus for development • 5-6 development iterations • Holdout tagged portion for validation • Once model developed, evaluate on large remaining portion of the tagged corpus 44

  45. Model Evaluation & Testing Plan Main Measure: Proportion of referring expressions correctly resolved 45

  46. Benefits and Contributions 46

  47. Theoretical Contributions • Empirical studies • Extend our understanding of how grounding works • Details how particular features of the technologies interact with task features to impact collaborative performance • Sequential discourse analyses • Provide detailed insight communication processes and where shared visual information is most useful • Computational modeling • Allows us to extend these findings and generalize to a wider number of situations • Model provides a method for predicting the magnitude of the impact of shared visual information • Allows us to compute the degree to which visual information may help resolve ambiguity or reference in new situations 47

  48. Methodological Contributions • A multidisciplinary research approach • Demonstrates a unique combination of techniques from behavioral research, discourse analysis, and computational linguistics • Produces complimentary findings and demonstrates understanding at many levels of granularity for a given a design space • A rigorous experimental paradigm • Decompose elements of shared visual information and examine impact on collaborative performance • Used by several researchers outside the lab • Methods for multimodal discourse segmentation • For example, how to integrate discrete linguistic information with more continuous visual information 48

  49. Practical Contributions • Rationale for design decisions • Often times spoken language is sufficient • Other times visual information is needed • Provide insight into design tradeoffs (e.g., rationale for screen allocation and layout decisions) • Insight into when particular pieces of visual information need to be provided to remote collaborators • An integrated module for reference resolution that can be integrated with existing systems • Provide a predictive environment for examining the value of shared visual information in novel situations • Provide more natural human-computer/agent/robot interactions 49

  50. Timeline 50

More Related