Evaluation of Mixed Initiative Systems

Evaluation of Mixed Initiative Systems Michael J. Pazzani University of California, Irvine National Science Foundation

Overview • Evaluation • Micro-level: Modules • Macro-level: Behavior of System Users • Caution: Don’t lose sight of the goal in evaluation • National Science Foundation • CISE (Re)organization • Funding for Mixed Initiative Systems • Tip on writing better proposals: Evaluate

Evaluation • Micro level • Does the module (machine learning, user modeling, information retrieval and visualization, etc,) work properly. • Has been responsible for measurable progress in most specialized domains of intelligent systems • Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves • Builds upon long history in “hard” sciences and engineering

Evaluation • Macro level • Does the complex system, involving a user and a machine work as desired. • Builds upon history in human (and animal) experimentation, not always taught in (or respected by) engineering schools • Allows controlled experiments comparing two systems (or one system with two variations)

Adaptive Personalization

Micro: Evaluating the Hybrid User Model

Micro: Speed to Effectiveness Initially, AIS is as effective as a static system in finding relevant content. After only one usage, the benefits of AdaptiveInfo's Intelligent Wireless Specific Personalization are clear; after three sessions even more so; and, after 10 sessions the full benefits of Adaptive Personalization are realized

Macro: Probability a Story is Read 40% probability a user will read one of the top 4 stories selected by an editor, but a 64% chance they'll read one of the top 4 personalized stories - the AIS user is 60% more likely to select a story than a non-AIS user

Macro: Increased Page Views After looking at 3 or more screens of headlines, users read 43% more of the personally selected news stories; clearly showing AIS's ability to dramatically increase stickiness of a wireless web application

Macro: Readership and Stickiness 20% more LA Times users who receive personalized news return to the wireless site 6 weeks after the first usage.

Cautions • Optimizing a micro level evaluation may have little impact on the macro level. It may even have a counter-intuitive effect: • If personalization causes a noticeable delay, it may decrease readership • Don’t lose sight of the goal. • The metrics are just approximations of the goal. • Optimizing the metric may not optimize the goal.

Office of the Director Directorate for Biological Sciences Directorate for Geosciences Directorate for Computer and Information Sciences and Engineering Directorate for Mathematical and Physical Sciences Directorate for Education and Human Resources Directorate for Social, Behavioral And Economic Sciences Directorate for Engineering R&D within the NSF Organization

CISE Directorate: 2004 • Computing & Communications Foundations • Computer Networks & Systems • Information and Intelligent Systems (IIS) • Deployed Infrastructure

Information and Intelligent Systems Programs • Information and Data Management • Artificial Intelligence and Cognitive Science • Human Language and Communication • Robotics and Computer Vision • Digital Society and Technologies • Human Computer Interaction • Universal Access • Digital Libraries • Science and Engineering Informatics

Types of proposals/awards • IIS Regular Proposal Deadlines 250-600K 3 yr12/12 • CAREER Program  (400-500K, 5 year) late July • REU & RET supplements(10-30K 1 year) 3/1 • Information Technology Research (ITR) Probably Feb

NSF Merit Review Criteria Looking for important, innovative, achievable projects • Criterion 1: What is the intellectual merit and quality of the proposed activity? • Criterion 2: What are the broader impacts of the proposed activity? NSF will return proposal without review if the single page proposal summary does not address each criteria in separate statements Evaluation Plan of both micro & macro levels is essential using metrics that you propose (and your peers believe are appropriate)

Evaluation of Mixed Initiative Systems

Evaluation of Mixed Initiative Systems

Presentation Transcript

IT/CS 803 Doctoral Tutorial Mixed-Initiative Intelligent Systems

Evaluation of surveillance systems

Evaluation of IR Systems

Mixed Economic Systems

Helping the design of Mixed Systems

Toward a Reliable Evaluation of Mixed-Initiative Systems

Mixed Reality Systems

Mixed-Initiative in Computer Games

GTrans: Mixed-Initiative Planning System

Evaluation of Recommender Systems

Mixed Reality Systems

Mixed-Initiative Elements in Intelligent Tutoring Systems

Sustainability of Dryland Mixed Farming Systems

Initiative Evaluation

Genetic Evaluation of Mixed Breed Populations

Mixed Methods For Impact Evaluation

Mixed-Initiative Planning

Evaluation of Research Systems

Toward Mixed-Initiative Clustering

Mixed-Initiative Dialogue Systems for Collaborative Problem-Solving

Evaluation of information systems