Improving intelligent assistants for desktop activities

Improving intelligent assistants for desktop activities Simone Stumpf, Margaret Burnett, Thomas Dietterich Oregon State University School of Electrical Engineering and Computer Science

Overview • Background • Activity switching problems • How to improve activity prediction • Reducing interruptions • Improving accuracy • Conclusion

Background: TaskTracer System • Intelligent PIM system • The user organizes everyday life into different activities that have a set of resources • e.g., “teach cs534”, “iui-07 paper”, etc. • How it works • The user indicates the current activity • TaskTracer tracks events (File open, etc.) • TaskTracer automatically associates resources with the current activity • TaskTracer provides useful information finding services through intelligent assistants

Activity switching problems Physical cost (mouseclicks, keypresses) • To provide services: • Assumes that users switches activity so data is not too noisy • TaskPredictor assists by predicting activity, based on resource use AAAI web page IL local folder IL netw IL DOC AAAI PPT Cognitive cost (deciding to switch)

TaskPredictor • Window-document segment (WDS) = unbroken time period in which a window in focus is showing a single document • Assumptions • A prediction is only necessary when the WDS changes • A prediction is only made if predictor is confident enough • Shen et al. IUI 2006 • Source of features: words in window titles, file pathnames, website URLs, (document content) • Hybrid approach: Naïve Bayes and SVM • Accuracy: 80% on 10% coverage

Reducing interruptions…

Problems in activity prediction • Potential notifications still high • Wait to see if user stays on WDS to reduce number of notifications Physical cost to interact (mouseclicks, keypresses) Cognitive cost to interact (deciding to switch)

Activity boundaries Prepare IL paper Download latest version Open document Edit document Save document Upload latest version • Iqbal et al. CHI 2005, 2006 • Interruption costs are lower on boundaries • Costs high within a unit • So what happens if the user does stay on WDS?

Reducing interruptions • Move from single-window prediction to multiple-window prediction (Shen et al, IJCAI 2007) • Identify user costs to make prediction • Determine opportunities intelligently • Trade-off of user cost/benefit • Make predictions at boundaries, then commit changes on user feedback

Improving accuracy…

Why improve accuracy? • 100% accuracy rare • TaskPredictor and other predictors may make wrong predictions • Limited feedback – only labels Users know more – can we harness it? How can learning systems explain their reasoning to the user? What is the users’ feedback to the learning system? (Stumpf et al. IUI 2007)

Pre-study explanation generation Enron farmer-d 122 emails, 4 folders (Bankrupt, Enron News, Personal, Resume) 1 n Rule-based … Ripper 1 n … 1 n … Keyword-based NB 1 n … Similarity-based Concrete, and simplified but faithful

Rule-based

Keyword-based 5 words in email having highest positive weight 5 words in email having most negative weight

Similarity-based Most decrease if removed from training set Up to 5 words in both emails having highest weights

Post-session questionnaire 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Post-block questionnaire Within-subject study design 15 minutes 1 2 3

Giving feedback • Participants were asked to provide feedback to improve the predictions • No restrictions on form of feedback

Negative comments (20%) …those are arbitrary words. Confusion (8%) I don’t understand why there is a second email. Positive comments (19%) The Resume rules are good. Understanding (17%) I see why it used “Houston” as negative. Responses to explanations Correcting or suggesting changes (32%) Different words could have been found in common, like “Agreement”, “Ken Lay.”

Understanding explanations • Rule-based best, then Keyword-based • Serious problems with Similarity-based • Factors: • General idea of the algorithm I guess it went in here because it was similar to another email I had already put in that folder. • Keyword-based explanations’ negative keyword list I guess I really don’t understand what it’s doing here. If those words weren’t in the message? • Word choices’ topical appropriateness “Day”, “soon”, and “listed” are incredibly arbitrary keywords.

Preferring explanations • Preference trend follows understanding • Factors: • Perceived reasoning soundness and accuracy I think this is a really good filter… • Clear communication of reasoning I like this because it shows relationships between other messages in the same folder rather than just spitting out a bunch of rules with no reason behind it. • Informal wording This is funny... (laughs) ... This seems more personable. Seems like a narration rather than just straight rules. It’s almost like a conversation.

The user explains back • Select different features (53%) It should put email in ‘Enron News’ if it has the keywords “changes” and “policy”. • Adjust weights (12%) The second set of words should be given more importance. • Parse/extract in different way (10%) I think that it should look for typos in the punctuation for indicators toward ‘Personal’. • Employ feature combinations (5%) I think it would be better if it recognized a last and a first name together. • Use relational features (4%) This message should be in ‘EnronNews’ since it is from the chairman of the company.

Underlying knowledge sources • Commonsense (36%) “Qualifications” would seem like a really good Resume word, I wonder why that’s not down here. • English (30%) Does the computer know the difference between “resumé” and “resume”? • Domain (15%) Different words could have been found in common like … “Ken Lay”.

Current work • More than 50% of suggestions could be easily incorporated • New algorithms to handle changes to weights and keywords • User feedback as constraints on MLE of the parameters • Co-Training • Investigate effects on accuracy using study data • Constraints: Not hurting but not much improvement either • Co-training approach better

Conclusion • User costs important • Higher accuracy • Timing of prediction notifications • Usefulness of predictions • Explanations of why a prediction was made

Improving intelligent assistants for desktop activities