170 likes | 313 Views
Leveraging ... User Models. Leveraging Data About Users in General in the Learning of Individual User Models* Anthony Jameson PhD (Psychology) Adjunct Professor of HCI Frank Wittig CS Researcher Saarland University, Saarbrucken Germany * i.e. pooling knowledge to improve learning accuracy.
E N D
Leveraging ... User Models Leveraging Data About Users in General in the Learning of Individual User Models* • Anthony Jameson PhD (Psychology) • Adjunct Professor of HCI • Frank Wittig • CS Researcher • Saarland University, Saarbrucken Germany *i.e. pooling knowledge to improve learning accuracy
Their Goal • To answer the question: • How can systems that employ Bayesian networks to model users most effectively exploit data about users in general and data about the individual user? • Most previous approaches looked only at: • Learning general user models • Apply the model to users in general • Learning individual user models • Apply each model to its particular user
Collaborative Filtering and Bayesian Networks • Collaborative filtering systems can make individualised predictions based on a subset of users determined to be similar to U • But sometimes we want a more interpretable model • Causal relationships are represented explicitly • Can predict behaviour of U based on contextual factors • Can make inferences about unobserved contextual factors • Bayesian networks are more straightforwardly applied to this type of task
Collaborative Filtering Example – Recommending Products • Each user rates a subset of products • Determines the users tastes as well as product quality • To recommend a CD for user U • First look for users especially similar to U • i.e. who have rated similar items in a similar way • Compute the average rating for this subset of users • Recommend products with high ratings • Used by Amazon.com, CDNow.com and MovieFinder.com [Herlocker et al. 1999]
Their Experiment - Inferring Psychological States of the User • Simulated on a computer workstation • Navigating through a crowded airport while asking a mobile assistant questions via speech • Pictures appeared to prompt questions • Some instructed time pressure • Finish each utterance as quickly as possible • Some instructed to do a secondary task • “navigate” through terminal (using arrow keys) • Speech input was later coded semi-automatically to extract features
Learning Models Used • Model #1 - General Model • Learned from experimental data via maximum-likelihood method (not adapted to individual users) • Model #2 - Parameterised Model • Like general model, but baselines for each user and for each speech metric are included • Model #3 - Adaptive (Differential) Model • Uses AHUGIN method (next slide) • Model #4 - Individual Model • Learned entirely on individual data
A Tangent – AHUGIN[Olesen et al. 1992] • Adaptive HUGIN • No explicit dimensional representation for how users differ • The conditional probability tables (CPTs) of the Bayesian network are adapted with each observation • Thus a variety of individual differences can be adapted to, without the designer of the BN anticipating their nature
Equivalent Sample Size (ESS) • However, you also need to address the speed at which the CPTs adapt • The ESS represents the extent of the system's reliance on the initial general model, relative to each users' new data • This paper contributes a principled method of estimating the optimal ESS, which is generally not obvious a priori, nor consistent across the different parts of the BN • Differential adaptation
Speech Metrics;Results • Articulation Rate • Syllables articulated per second of speaking • General performs worst, other three on par • Individual takes a while to catch up, as with all metrics • Number of Syllables • The number of syllables in the utterance • Again, General is poor, Parameterised OK, Individual and Adaptive best • Disfluencies and Silent Pauses • Any of four types of disfluency; e.g. failing to complete a sentence • Duration of silent pauses relative to word number • All about equal (perhaps due to infrequencies)
The plots articulation rate number of syllables disfluencies silent pauses
Experimental Conditions;Results • Rather than predict an aspect of U’s behaviour, try to infer whether U was working under time pressure or with a secondary task • Individual model performed very poorly • worse, in fact, than flipping a coin • No clear advantage of any model over the General model • Performance is, at best, only marginally better than chance for all models • They say that this classification task is inherently difficult (although not why…)
Findings • Their adaptive model generally gives good performance, especially for prediction tasks with complex individual differences • It allows a smooth adaptation from a general model to an individual model • General model is usually outperformed by all others • Parameterised model sits in between • Individual model is very poor early on (and sometimes for an especially long time) • but gives good performance where behaviour is highly idiosyncratic (individually peculiar)
Differential Adaptation Revisited • It leverages data about previous users not only to learn an initial general user model but also to learn how fast various aspects of this model should adapt to each new user • It doesn’t require explicit representation of dimensions along which users may differ (unlike the parameterised model) • However, learning may be unnecessarily slow due to the large degree of freedom
Summary • No solid conclusions; but • They systematically compared, with regard to several criteria (?), four representative (?) ways of exploiting data about users in general and/or individual users • Introduced a variant of the AHUGIN adaptation method called differential adaptation, which represents a principled way of non-uniform adaptation of a general model to an individual user model
References • Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers and John Reidl. An algorithmic framework for performing collaborative filtering. In Proceedings of the 1999 Conference on Research and Development in Information Retrieval, 1999. • Kristian G. Olesen, Steffen L. Lauritzen and Finn V. Jensen. aHUGIN: A system creating adaptive causal probabilistic networks. In Uncertainty in Artificial Intelligence: Proceedings of the Eight Conference, pages 223-229, 1992.
Questions? • Then Dave can rip into it