160 likes | 173 Views
Learn about evaluating mixed initiative systems at micro and macro levels using metrics like error rate, precision, recall, and more. Understand the impact on user behavior and system performance for adaptive personalization. Discover insights on writing better proposals for funding mixed initiative systems.
E N D
Evaluation of Mixed Initiative Systems Michael J. Pazzani University of California, Irvine National Science Foundation
Overview • Evaluation • Micro-level: Modules • Macro-level: Behavior of System Users • Caution: Don’t lose sight of the goal in evaluation • National Science Foundation • CISE (Re)organization • Funding for Mixed Initiative Systems • Tip on writing better proposals: Evaluate
Evaluation • Micro level • Does the module (machine learning, user modeling, information retrieval and visualization, etc,) work properly. • Has been responsible for measurable progress in most specialized domains of intelligent systems • Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves • Builds upon long history in “hard” sciences and engineering
Evaluation • Macro level • Does the complex system, involving a user and a machine work as desired. • Builds upon history in human (and animal) experimentation, not always taught in (or respected by) engineering schools • Allows controlled experiments comparing two systems (or one system with two variations)
Micro: Speed to Effectiveness Initially, AIS is as effective as a static system in finding relevant content. After only one usage, the benefits of AdaptiveInfo's Intelligent Wireless Specific Personalization are clear; after three sessions even more so; and, after 10 sessions the full benefits of Adaptive Personalization are realized
Macro: Probability a Story is Read 40% probability a user will read one of the top 4 stories selected by an editor, but a 64% chance they'll read one of the top 4 personalized stories - the AIS user is 60% more likely to select a story than a non-AIS user
Macro: Increased Page Views After looking at 3 or more screens of headlines, users read 43% more of the personally selected news stories; clearly showing AIS's ability to dramatically increase stickiness of a wireless web application
Macro: Readership and Stickiness 20% more LA Times users who receive personalized news return to the wireless site 6 weeks after the first usage.
Cautions • Optimizing a micro level evaluation may have little impact on the macro level. It may even have a counter-intuitive effect: • If personalization causes a noticeable delay, it may decrease readership • Don’t lose sight of the goal. • The metrics are just approximations of the goal. • Optimizing the metric may not optimize the goal.
Office of the Director Directorate for Biological Sciences Directorate for Geosciences Directorate for Computer and Information Sciences and Engineering Directorate for Mathematical and Physical Sciences Directorate for Education and Human Resources Directorate for Social, Behavioral And Economic Sciences Directorate for Engineering R&D within the NSF Organization
CISE Directorate: 2004 • Computing & Communications Foundations • Computer Networks & Systems • Information and Intelligent Systems (IIS) • Deployed Infrastructure
Information and Intelligent Systems Programs • Information and Data Management • Artificial Intelligence and Cognitive Science • Human Language and Communication • Robotics and Computer Vision • Digital Society and Technologies • Human Computer Interaction • Universal Access • Digital Libraries • Science and Engineering Informatics
Types of proposals/awards • IIS Regular Proposal Deadlines 250-600K 3 yr12/12 • CAREER Program (400-500K, 5 year) late July • REU & RET supplements(10-30K 1 year) 3/1 • Information Technology Research (ITR) Probably Feb
NSF Merit Review Criteria Looking for important, innovative, achievable projects • Criterion 1: What is the intellectual merit and quality of the proposed activity? • Criterion 2: What are the broader impacts of the proposed activity? NSF will return proposal without review if the single page proposal summary does not address each criteria in separate statements Evaluation Plan of both micro & macro levels is essential using metrics that you propose (and your peers believe are appropriate)