Toward a Reliable Evaluation of Mixed-Initiative Systems

Toward a Reliable Evaluation of Mixed-Initiative Systems Gabriella Cortellessa and Amedeo Cesta National Research Council of Italy Institute for Cognitive Science and Technology Rome, Italy

Outline • Motivations • Aims of the study • Users’ attitude towards the mixed-initiative paradigm • Role of explanation during problem solving • Evaluation Method • Results • Conclusions and future work

Motivations • Lack of studies that investigate users attitude towards this solving paradigm • Lack of methodologies for evaluating different aspects of mixed-initiative problem solving This work applies an experimental approach (from HCI and Psychology) to the problem of understanding users’ attitude towards the mixed-initiative approach and investigating the importance of explanation as a means to foster users’ involvement in the problem solving

Artificial problem solver Interaction Module User User Two alternative Problem Solving approaches Automated approach Mixed-Initiative approach

considering users’ requirements and judgment Evaluating Mixed-Initiative Systems • Measuring the overall problem solving performance • The pair human-artificial system is supposed to exhibit better performances (metrics). • Evaluating aspects related to users’ requirements and judgment on the system. • Usability, level of trust, clarity of presentation, user satisfaction etc.

Aims of the study • Users’ attitude towards the solving strategy selection. • Automated vs mixed-initiative • The recourse to explanation during problem solving • Explanations for solvers’ choices and failures Differences between experts and non experts

Solving strategy selection No empirical studies in the mixed-initiative area explore the context of strategy selection (who and why choose a solving strategy) However: Decision Support Systems • Empirical evidence of low trust toward automated advices during decision making processes (Jones & Brown, 2002). Human-Computer Interaction • Artificial solver as a competitor rather than a collaborator (Langer, 1992; Nass & Moon, 2000).

Solving strategy selection: Hypotheses Two variables are supposed to influence the selection of the solving strategy (automated vs. mixed-initiative): user’s expertise, and problem difficulty Hypothesis 1: It is expected that expert users exploit the automated procedure more than non-experts; and, conversely, non-expert users exploit the mixed-initiative approach more than experts. Hypothesis 1a: It is expected that inexperienced users prefer the mixed-initiative approach when solving easy problems, and the automated strategy when solving difficult problems, while expert users are expected to show the opposite behavior.

Explanation Recourse No empirical studies in the mixed-initiative research field investigate the role of explanations in cooperative problem solving However: Knowledge-Based Systems • explanation recourse is more frequent in case of systems failures (Gilbert, 1989; Schank, 1986; Chandrasekaran & Mittal, 1999). • explanation recourse is more frequent in case of collaborative problem solving (Gregor, 2001) • individual differences in the motivations for explanations recourse (Mao & Benbasat, 1996; Ye, 1995).

Explanation Recourse: Hypotheses The following variables are supposed to influence the recourse to explanation: user’s expertise, problem difficulty,strategy selection,failure. Hypothesis 2: The access to explanation is more frequent in case of failure than in case of success. Hypothesis 3 : Access to explanation is related to the solving strategy selection. • In particular participants who choose the automated solving strategy access more frequently to explanation than those who use the mixed-initiative approach.

Explanation Recourse: Hypotheses Hypothesis 4: During problem solving non experts access explanations more frequently than experts. Hypothesis 5: Access to explanation is more frequent in case of difficult problems.

Evaluation Method • Participants: • 96 participants balanced with respect to gender, education, age and profession, subdivided in two groups based on the level of expertise (40 experts and 56 non experts). • Experimental apparatus: • COMIREM problem solver • Planning and scheduling problems • Procedure: • Web-based apparatus • Stimuli: Problems solution • Questionnaires

A mixed-initiative problem solver: COMIREM COMIREM: Continuous Mixed-Initiative Resource Management Developed at Carnegie Mellon University Automated Solver Interaction Module User (Smith et al, 2003)

DataBase http://pst2.istc.cnr.it/experiment Procedure • Training session • Two experimental sessions presented randomly : • Session 1: easy problems • Questionnaire 1 • Session 2: difficult problems • Questionnaire 2 • For each session participants were asked to choose between mixed and automated strategy Web-based

Tasks Stimuli • 4 scheduling problems defined in the field of a broadcast TV station resources management: • 2 solvable • 2 unsolvable Questionnaires aiming to • Assessing the difficulty of the task: 5-steps Likert scale (Manipulation check of variable difficulty) • Evaluating the clarity of textual and graphical representations: (5-steps Likert scale) • Investigating the reasons for choosing the selected strategy (multiple choice) • Studying the reasons for accessing the explanation (only 2nd questionnaire)

Solving Strategy Selection Results

Dependent Variables Std. N Mean Deviation .6786 n_auto Non Expert 56 .6786 .7653 Expert 40 1.3750 .7048 1.3750 Total 96 .9688 .8137 n_mista Non Expert 1.3214 56 1.3214 .7653 Expert 40 .6250 .7048 .6250 Total 96 1.0313 .8137 expertise F(1,94) = 20.62, p < .001 Influence of expertise on strategy Choice_auto Choice_mixed Influence of expertise on solving strategy selection (statistics)

Influence of expertise on strategy Hypothesis 1: Solving strategy selection (automated vs mixed-initiative) depends upon users’ expertise VERIFIED: p < .001 Experts automated Non experts  mixed-initiative

Influence of difficulty on strategy strategy Easy Problems expertise Automated Mixed 24 32 Chi-square = 9.80, df=1, p< .01 Non expert 30 10 Expert 42 54 Total strategy expertise Difficult Problems Automated Mixed 24 32 Chi-square = 3.6 , df=1, n. s. Non expert 25 15 Expert 47 49 Total

Influence of difficulty on strategy • Hypothesis 1a: Solving strategy selection (automated vs mixed-initiative) is related to problem difficulty PARTIALLY VERIFIED: Easy problems  experts: automated, non experts: mixed(p< .01) Difficult problems(n. s.)

Reasons for strategy selection Automated -- Easy Automated -- Difficult Chi-square = .92 , df=2, n. s. Chi-square = 3.9 , df=2, p< .05 Mixed -- Easy Mixed -- Difficult Chi-square = 1.32 , df=2, n. s. Chi-square = 1.15 , df=2, n. s.

Explanation Recourse Results

Dependent Variables Std. Mean Deviation N .8111 Access_failure .8111 .3716 90 .3702 Access_correct .3702 .3354 90 F(1,89) = 85.37, p< .001 Correlation Analysis r= .035, n.s. r = .86 p < .001. in case of failure in case of success Influence of failures on explanation

Influence of failures on explanation Hypothesis 2: The access to explanation is more frequent in case of failure than in case of success. VERIFIED p< .001

Dependent Variables I_AC_DIF Indice di accesso alla spiegazione in caso I_AC_FAC Indice di accesso alla spiegazione in caso di compiti DIFFICILI di compiti FACILI Access easy Access difficult Std. Std. N Mean Deviation N Mean Deviation .6297 .8769 Automated 49 .6297 .2959 Automated 54 .8769 .3373 Mixed .2790 47 .2790 .2709 Mixed 42 .2802 .3202 .2802 Total 96 .4580 .3329 Total 96 .6158 .4430 F(1,94) = 77.26, p< .001 F(1,94) = 36.60, p< .05 Influence of strategy on explanation Easy problems Difficult problems

Influence of strategy on explanation Hypothesis 3: Access to explanation is related to the solving strategy selection. Access to explanation is more frequent in case of automated strategy choice VERIFIED Easy problems p< .001 Difficult problems p< .05

Dependent Variables Std. Expertise Mean Deviation N .5423 .7187 Access_easy Non Expert .5423 .4760 56 Expert .7187 .3740 40 .6158 Total .6158 .4430 96 Access_difficult Non Expert .3829 .5632 .3829 .3177 56 Expert .5632 .3289 40 .4580 Total .4580 .3329 96 expertise F(1,94) = 7.34, p< .01 difficulty F(1,94) = 12.54, p< .01 interaction F(1,94) = .002, n. s. Influence of expertise and difficulty on explanation

difficulty p< .01 Influence of expertise and difficulty on explanation • Hypotheses 4 e 5: • During problem solving non experts rely on explanation more frequently than experts • Access to explanation is more frequent in case of difficult problems. FALSIFIED expertise p< .01

Reasons for accessing explanation Non Experts Experts Understand the problem Understand automated solvers choices Chi-square = 2,28 , df=1, n. s.

Conclusions • Solving strategy selection depends upon users’ expertise • Experts  automated • Non experts  mixed-initiative • The mixed initiative approach is chosen to maintain the control over the problem solving • Explanation during problem solving is frequently accessed (73 out of 96 respondents), the access being more frequent in case of: • Failures during problem solving • When using the automated strategy • Explanation is accessed to understand solvers choices

Contributions • Empirical proof that the mixed-initiative approach responds to a specific need of end users to keep the control over automated systems. • The study confirms the need for developing problem solving systems in which humans play an active role • Need for designing different interaction styles to support the existing individual differences (e.g., expert vs non experts) • Empirical proof of the usefulness of explanation during problem solving. Failures have been identified as a main prompt to increase the frequency of access to explanation

Remarks • Need for designing evaluation studies which takes into consideration the human component of the mixed-initiative system (importing methodologies from other fields) • At present we have inherited the experience from disciplines like HCI and Psychology and adapted them to our specific case. • The same approach can be followed to broaden the testing of different mixed-initiative features.

Future work • Investigating the impact of strategy (automated vs mixed-initiative) and explanation recourse on problem solving performance. • Application of the evaluation methodology to measure different features of the mixed-initiative systems. • Synthesis of “user-oriented” explanations

Toward a Reliable Evaluation of Mixed-Initiative Systems

Toward a Reliable Evaluation of Mixed-Initiative Systems

Presentation Transcript

IT/CS 803 Doctoral Tutorial Mixed-Initiative Intelligent Systems

Evaluation Improvement Initiative A New Evaluation Approach

GTrans: A Collaborative Mixed-Initiative Planner

Mixed Economic Systems

Toward a Theory of Adolescent Sexual Decision-Making: A Mixed Methods Approach

TOWARD RELIABLE, SECURE, AND SURVIVABLE CYBER-PHYSICAL ENERGY SYSTEMS

“Distributed Planning in a Mixed-Initiative Environment”

Mixed Reality Systems

Mixed-Initiative in Computer Games

Initiative “Chihuahua toward C ompetitiveness ”

GTrans: Mixed-Initiative Planning System

Toward achieving reliable sepsis care

Mixed Reality Systems

Mixed-Initiative Elements in Intelligent Tutoring Systems

Initiative Evaluation

Genetic Evaluation of Mixed Breed Populations

Mixed-Initiative Planning

Evaluation of Mixed Initiative Systems

Toward Generic Systems

Toward Mixed-Initiative Clustering

Mixed-Initiative Dialogue Systems for Collaborative Problem-Solving