1 / 16

Evaluation of Mixed Initiative Systems

Learn about evaluating mixed initiative systems at micro and macro levels using metrics like error rate, precision, recall, and more. Understand the impact on user behavior and system performance for adaptive personalization. Discover insights on writing better proposals for funding mixed initiative systems.

pedroham
Download Presentation

Evaluation of Mixed Initiative Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Mixed Initiative Systems Michael J. Pazzani University of California, Irvine National Science Foundation

  2. Overview • Evaluation • Micro-level: Modules • Macro-level: Behavior of System Users • Caution: Don’t lose sight of the goal in evaluation • National Science Foundation • CISE (Re)organization • Funding for Mixed Initiative Systems • Tip on writing better proposals: Evaluate

  3. Evaluation • Micro level • Does the module (machine learning, user modeling, information retrieval and visualization, etc,) work properly. • Has been responsible for measurable progress in most specialized domains of intelligent systems • Relatively easy to do using well known metrics, error rate, precision, recall, time and space complexity, goodness of fit, ROC curves • Builds upon long history in “hard” sciences and engineering

  4. Evaluation • Macro level • Does the complex system, involving a user and a machine work as desired. • Builds upon history in human (and animal) experimentation, not always taught in (or respected by) engineering schools • Allows controlled experiments comparing two systems (or one system with two variations)

  5. Adaptive Personalization

  6. Micro: Evaluating the Hybrid User Model

  7. Micro: Speed to Effectiveness Initially, AIS is as effective as a static system in finding relevant content. After only one usage, the benefits of AdaptiveInfo's Intelligent Wireless Specific Personalization are clear; after three sessions even more so; and, after 10 sessions the full benefits of Adaptive Personalization are realized

  8. Macro: Probability a Story is Read 40% probability a user will read one of the top 4 stories selected by an editor, but a 64% chance they'll read one of the top 4 personalized stories - the AIS user is 60% more likely to select a story than a non-AIS user

  9. Macro: Increased Page Views After looking at 3 or more screens of headlines, users read 43% more of the personally selected news stories; clearly showing AIS's ability to dramatically increase stickiness of a wireless web application

  10. Macro: Readership and Stickiness 20% more LA Times users who receive personalized news return to the wireless site 6 weeks after the first usage.

  11. Cautions • Optimizing a micro level evaluation may have little impact on the macro level. It may even have a counter-intuitive effect: • If personalization causes a noticeable delay, it may decrease readership • Don’t lose sight of the goal. • The metrics are just approximations of the goal. • Optimizing the metric may not optimize the goal.

  12. Office of the Director Directorate for Biological Sciences Directorate for Geosciences Directorate for Computer and Information Sciences and Engineering Directorate for Mathematical and Physical Sciences Directorate for Education and Human Resources Directorate for Social, Behavioral And Economic Sciences Directorate for Engineering R&D within the NSF Organization

  13. CISE Directorate: 2004 • Computing & Communications Foundations • Computer Networks & Systems • Information and Intelligent Systems (IIS) • Deployed Infrastructure

  14. Information and Intelligent Systems Programs • Information and Data Management • Artificial Intelligence and Cognitive Science • Human Language and Communication • Robotics and Computer Vision • Digital Society and Technologies • Human Computer Interaction • Universal Access • Digital Libraries • Science and Engineering Informatics

  15. Types of proposals/awards • IIS Regular Proposal Deadlines 250-600K 3 yr12/12 • CAREER Program  (400-500K, 5 year) late July • REU & RET supplements(10-30K 1 year) 3/1 • Information Technology Research (ITR) Probably Feb

  16. NSF Merit Review Criteria Looking for important, innovative, achievable projects • Criterion 1: What is the intellectual merit and quality of the proposed activity? • Criterion 2: What are the broader impacts of the proposed activity? NSF will return proposal without review if the single page proposal summary does not address each criteria in separate statements Evaluation Plan of both micro & macro levels is essential using metrics that you propose (and your peers believe are appropriate)

More Related