180 likes | 348 Views
Journal of Cognitive Engineering and Decision Making, 1(2), 2007. Including a Model of Visual Processing With a Cognitive Architecture to Model a Simple Teleoperation Task. Frank E. Ritter Urmila Kukreja Robert St. Amant. Contents. Introduction
E N D
Journal of Cognitive Engineering and Decision Making, 1(2), 2007. Including a Model of Visual Processing With a Cognitive Architecture to Model a Simple Teleoperation Task Frank E. Ritter UrmilaKukreja Robert St. Amant
Contents Introduction More Direct Visual Processing for a Cognitive Architecture Exploring This Approach in an Example HRI Task Results Conclusion
Introduction(1/5) • Although HCI evaluation techniques are robust and may have clear application to HRI, HRI poses more difficult problems for evaluation. • HRI covers a wider range of tasks and environments. • Yanco et al. (2004) identified three classes of evaluation methods common in HCI that might be used in HRI: • Inspection methods carried out by interface design experts • Empirical methods based on observation and testing with users • Formal methods based on analytical models of interaction • There are other differences between HCI and HRI that make some of the evaluation techniques less applicable. • Scholtz (2003) reviewed these differences: 6 dimensions. • The focus of this article. • Is on a technique for HRI evaluation based on detailed cognitive modeling.
Introduction(2/5) • Proponents of cognitive modeling can point to a number of potential results that make pursuit of the approach worthwhile. • CMs provide accounts of user behavior that generalize across a variety of environments and tasks. • User and model can perform the tasks in those environments. • CMs can offer such predictions and explanations at a more detailed level than is often possible with other techniques. • What cognitive, perceptual, and motor mechanisms are being used. • CMs offer new opportunities for experimental evaluation of interactive systems. • Less expensive than real users. • CMs are dynamic. • Interactions may be difficult to specify and to analyze with static description. An important first challenge is determining whether it is feasible to build low level cognitive models that can carry out HRI tasks as surrogate users working directly with interfaces.
Introduction(3/5) • Cognitive Models for Evaluating Interactive Systems • Typical user interface evaluation processes using a cognitive model • Pilot studies and literature review • Developing a cognitive model for one or more tasks • Designing experiment • Collecting data for comparison between human and model performance • Validating and improving the cognitive model
Introduction(4/5) • Implications for User Modeling in HRI • Some of the requirements for interactions with a robot control task (User must): • Carry out a repetitive task in a changing environment. • Interpret scenes captured by one or more cameras on a robot. • Deal with unfamiliar classes of information from novel sensors. • Integrate information from a physical and a virtual environment. • Often provide high-level guidance to a robot. • Control multiple robots at the same time. • Scholtz et al.(2004) described six dimensions in which HRI differs from HCI: • A wider variety of roles that users and operators play in controlling a robot. • The less predictable ways in which a robot platform interacts with the physical environment. • The dynamic behavior of the hardware (leading to autonomous changes) • The nature of the user’s interaction environment. • The number of robots that the user controls. • The autonomous behavior of the robot.
Introduction(5/5) • Implications for User Modeling in HRI • The user in an HRI environment must potentially deal with several coupled problem-solving environments: • Decision making in the environment (partially observable, dynamic, continuous). • Monitoring of robot hardware and robot behaviors in the environment. • Interacting with an HRI control interface. • In the environmental features noted previously, there are two recurring issues: • The need to interpret a complex information environment (most of the data provided by visual channels) and • Manage complexity in controlling the behavior of robots.
More Direct Visual Processing for a Cognitive Architecture (1/5) • When evaluating a computer system, either in an HCI or an HRI context, a cognitive model needs access to the information that a user sees. • An open issue is how environmental information can best be translated into visual objects. • ACT-R models typically receive information about a visual environment through specialized functions that interact with specially instrumented interface windows. • These windows are built into a tool that is included with ACT-R to create simulated task environments. • We worked toward extending the visual-processing capabilities of a cognitive modeling architecture to use the environment without modifications.
More Direct Visual Processing for a Cognitive Architecture (2/5) • SegMan(Segmentation/Manipulation) Overview • Image-processing substrate (cognitive modeling component) • Extends ACT-R’s (and other cognitive architecture) visual processing to work with interfaces through parsing their bitmap. SegMan serves as an intermediary between an environment (visual scene) and the cognitive modeling system (ACT-R). (source: Amant et al., 2005)
More Direct Visual Processing for a Cognitive Architecture (3/5) • SegMan’s three types of visual properties: region/group properties • Properties of pixel regions, relationships between pixel regions, and composite properties of pixel groups Figure 1 illustrates four pixel regions: Figure 1. A schematic of four pixel regions, with annotations. SegMan summarizes its neighbor value with the function: v(p) = 22+24+26 = 84
More Direct Visual Processing for a Cognitive Architecture (4/5) • SegMan Operation • SegMan generates and maintains its representation in three stages: • Segmentation: Sampling pixels at different resolutions. • Feature computation: properties are computed for each pixel region, regions are combined into pixel groups, and group properties are computed. • Interpretation: involves a top down matching process between library templates. • SegMan’s current design is strongly influenced by Roelfsema’s (2005) theory of visual processing. • Roelfsema (2005) identified two main classes of elemental operators. • Binding operators: cuing/searching/tracing/region filling/association • Maintenance-matching: record intermediate results for further processing. • Cuing and Searching • Cuing: identification of properties of the visual pattern at a specified location. • Search: determining the location of a predefined visual pattern.
More Direct Visual Processing for a Cognitive Architecture (5/5) • Tracing and Region Filling • Tracing: iteratively identifying connectivity properties between simple visual elements. • Region filling: can be viewed as a generalization of curve tracing to two dimensions by identifying areas to be characterized as single visual objects. • Association • The linkage of features that co-occur repeatedly. • Matching • The process of identifying similarities (or differences) between stimuli. • Tracking moving objects • SegMan reprocesses information iteratively, and it maintains a representation of its immediately preceding results.
Exploring This Approach in an Example HRI Task (1/2) • SegMan can be evaluated in two ways. • First, can it perform the task of interest using psychologically plausible mechanisms? • We have used SegMan in combination with ACT-R to model a variety of tasks. • Second, is the system’s performance adequate for dynamic tasks? • We carried out both an evaluative and functional test of this combined architecture of SegMan and ACT-R as part of a larger HRI study. • HRI User Model • The model performs the same driving task as humans do. • The model uses a pseudo-fovea that is focused on the camera view window. • A pixel region template was defined to identify the path as a visual object in the camera view.
Exploring This Approach in an Example HRI Task (2/2) • HRI Study • Participants: 30 students • Users were divided into 3 groups: visible / previous seen / unseen condition • Task: navigate the robot to pick up the cup and return, and program it to do two tasks with its visual programming language. Figure 2. The ER1 Robot System Figure 3. The HRI interface that users saw when driving the ER1 robot remotely. Figure 4. The physical task environment for the human users.
Results (1/2) Figure 5. Comparison of the model’s predicted task time with human performance. Figure 6. Comparison of learning on the predicted and actual task time for the model and three conditions. Figure 7. The number of mouse clicks by the human drivers in each group and by the model to navigate the course and pick up the cup. Figure 8. Average mouse click duration for human drivers in each group and by the model.
Results (2/2) • John and Newell (1989) recommend using models as reasonable approximations of human performance if outputs do not deviate by more than 20% from observed data. • This model does not account for the differences between the conditions in the human navigation experiment. • We used the different conditions to explore performance in this task. The model most closely approximates users who are in the seen condition. • Limitations on the model as the representation of human performances • Model development and veridicality • The model was not tailored to the differences between the previously seen, unseen, and visible conditions for users. • Environmental constraints • Navigation path without obstacles was basically static, deterministic. • Task constraints • Navigation task was a very simple form. • Model constraints • The model resorted to random search when the path is no longer visible.
Conclusion • Recommendations for SegMan and ACT-R • The use of SegMan should be seen as an extension of ACT-R to support direct interaction with existing interfaces. • One modification to SegMan was necessary to support interacting with it this more complex task. • Recommendations for HRI Interface Design • This study has not yet progressed to the point that it is possible to identify interaction problems that cannot be identified by more conventional evaluation methods, such as inspection or simple user testing. • Limitations, Implications, and Future Work • More task will have to be modeled for evaluation on a routine basis. • SegMan interacts with hardware, making time predictions is more difficult to record accurately because the model’s timing has to be synchronized with and interact with the robot platform, and it has to run in real time.
Roelfsema’s elemental operators in vision • Binding operators • Binding operators establish groupings among visual objects that are not computed in early visual processing. • Maintenance operators • It is not enough to select an object, but the observer must be able to maintain the object in memory for future cognitive manipulations.