470 likes | 648 Views
Modelling of Interaction Dynamics in Internet-based Multi-user Scenarios. Ata Kabán http://www.cs.bham.ac.uk/~axk a.kaban@cs.bham.ac.uk School of Computer Science The University of Birmingham 15 th November 2005. Overview. Introduction A dynamic model for online community id
E N D
Modelling of Interaction Dynamics in Internet-based Multi-user Scenarios Ata Kabán http://www.cs.bham.ac.uk/~axk a.kaban@cs.bham.ac.uk School of Computer Science The University of Birmingham 15th November 2005
Overview • Introduction • A dynamic model for online community id • Convex linear and nonlinear models of heterogeneous sequence collections • Prediction & data exploration • Experiments & Applications • Conclusions • “The most important goal for theoretical computer science in 1950-2000 was to understand the von Neumann computer. The most important goal for theoretical computer science from 2000 onwards is to understand the Internet” • Christos H. Papadimitriou
Introduction • Scenarios • Direct interaction [online discussion stream] • Indirect interactions [browsing] • Challenges • Heterogeneous behaviour • Apparently highly entropic • Need parsimonious & efficient profiles • Both predictive & explanatory • Need to provide scalable algorithms
Why bother? • Trying to understand a complex system is scientifically interesting • The ability to analyse & predict individual activity is practically useful • To infer and predict user / costumer preferences • To understand user behaviour • To provide a basis for personalised environments based on history of activity • Examples: profiling consumer brand preferences; occupational mobility; web browsing; phone usage; command usage; etc • Evidence (data) is cheap to acquire • traces of user activity logs – most often symbolic sequences
Nothing works… • Community identification • Long standing problem • Clustering approaches • But: What if temporal events are subject to random delays • What if there are no distinct homogenous groups? • Prediction • Existing methods are global: They assume that all users are the same • One cannot observe each user long enough to build a fully personalised predictor
A Dynamic Bibliometric Model… Reference: X Wang & A Kaban, SIAM Data Mining 06, submitted • Data of one day worth discussions: T=25,355 contributions from W=844 chat participants. <bigboy> …. <xxx> … <yy> … <xxx> … <bigboy> ... <yy> … <bigboy> … <xxx> … <tinigirl> … <uuu> … <tinigirl> … The 1st order connectivity graph
A Dynamic Bibliometric Model… The Aggregate Markov model (Saul&Pereira’97) • the ‘group’ in a latent variable AGGREGATE MARKOV
But: Over 800 people typing concurrently at different terminals over the world Contributions arrive in a sequential order What are the ‘true’ interactions? – what is the ‘real’ connectivity graph? A Dynamic Bibliometric Model…
Mixed Memory Markov model (Raftery ’85, ’94; Saul & Jordan ‘99) parsimonious approximation to the higher order Markov model • 2 variants of the model • Limit distribution studied • Parsimonious use of parameters • Numerical optimisation algorithms • No clustering ability
A Dynamic Bibliometric Model… The Aggregate Mixed Memory Markov Model – or the Mixed Memory Aggregate Markov Model • The ‘true temporal connection’ is a latent variable AGGREGATE MIXED MEMORY MARKOV MIXED MEMORY AGGREGATE MARKOV
ML estimation algorithm • Iterate until convergence (equivalent with EM, space-efficient) • Each iteration scales linearly with the observed L-grams
Model (order) selection using AIC Aggregate Markov best model for this data Aggregate Markov Number of groups Number of groups (C=9,L=9)-Aggregate-Mixed-Memory Markov model wins Too many parameters…
A Dynamic Bibliometric Model for the Identification of Online Communities Reordered according to the inferred state clusters Inferred connections
The distribution of influential time lags Direct interactions [online chat] Indirect interactions [browsing]
Heterogeneous collections of sequences: e.g. browsing traces • - Individuals all interact with the same media independently • No clusters of sites found, as defined by the browsing paths globally • Tools are needed to capture the heterogeneity of individual activity traits together with global activity traits
Global models cannot capture heterogeneity – treat all individuals the same Individual models individual activity history typically too short for obtaining a reliable estimate for N individual we would need to store N times the nos of parameters of the sequence model employed Mixtures of sequence models (MMC) assume homogeneous prototypical behaviour within group cannot capture multiple relationships Probabilistic Models for Multiple Sequences: State of the Art Ref: Cadez et al ‘04 MMC
The need for distributed sequence models • Necessary trade-off between the definition of global and individual-specific representations • Common behavioural patterns are the basis of multiple relationships between individuals • May yield a more realistic model exhibited by the population as a whole • Parsimonious representation
Simplicial Mixtures of Markov Chains References: Girolami & Kaban, NIPS ’03Longer version in Data Mining and Knowledge Discovery, 10:3, 2005. • Now x is a continuous latent variable • Exact estimation becomes intractable • Approximate estimation techniques employed: MAP, variational Bayes • In both cases simple algorithm with linear scaling obtained SMMC
Single cause prior: • Multiple cause prior:
VB estimation Solving for T, x, α, Q, then replacing Q in the updates which contain it yields simple multiplicative updates similar to NMF.
Algorithm • Iterate until convergence: • Linear in the number of observed transitions
Application: Telephone Usage Modelling • 1,172,578 calls in week1 • 1,753,304 calls in week 2 • Destination numbers mapped to 87 geographic regions & mobile operators • Week1 activity employed for estimation • Week2 activity used for testing • Performance measures considered: • Predictive perplexity on unseen sequences • Percentage of symbols correctly predicted on unseen seq • Out of sample log likelihood • Parameter interpretability assessed
Prediction error on transactions in Week2 Solid straight line: global 1st order MC Solid line: SMMC (estimated with VB) Dashed line: SMMC (estimated with MAP) Dash-dot line: MMC
Explanatory user profiles Example of activity-profile of one of the customers over a K=20 component SMMC (one point on a 19-D latent simplex). Each of these components was a 1st order Markov Chain EP(x|Seq_n)[x]
Application: Web browsing behaviour prediction • Dataset previously used in Cadez et al. • 17 page categories from MSN website form the common state space • Users who visited at least 9 out of 17 page categories selected for this experiment • Total 119,667 page requests over 1,480 web browsing sessions (small data set)
10-fold cross-validated predictive perplexity Solid straight line: global 1-st order MC Solid line: SMMC (estimated with VB) Dashed line: SMMC (estimated with MAP) Dash-dot line: MMC
Complexity of the component MCs measured as the distribution of entropy rates Low complexity -favours predictability -favours interpretability
5 selected basis-transitions SMMC component MCs [separates common behaviour into one component] MMC cluster-prototypes [common behaviour superimposed on all prototypes] black=0, white=1
So far so good… • Community finding from direct online interactions by using discrete latent variables to infer the ‘true’ connections and the cluster membership • Distributed modelling of heterogeneous activity traces by using a continuous latent variable to capture the spread
Sample size issues - The estimation of mixtures needs a large number of sequences- The estimation of simplicial mixtures needs long (rich) sequences
Topographic Mixtures of Sequence Models Reference: A Kaban, Proc. ITCC’05 • Exact estimation is intractable • A sampling employed
The estimation algorithm • Iterate until convergence: • Each iteration scales linearly with the number of non-zero elements in the data! - Scalable Generative Topographic Mapping (SGTM)
Prediction with distributed sequence models • Combines basis-wise predictions in proportions specified by the posterior expectation • User-specific deeper past (w.r.t. the global trait) is embodied in the posterior expectation • In consequence neither a simplicial mixture of 1st order MCs nor a topographic mixture of 1st order MCs is a 1st order model
Illustration of the representationPrototype vs. aspects view It can be shown that the model estimation algorithm minimises a weighted sum of entropies of the parameters.
Visualisation of large document collections 10-Newsgroups text collection
Aspect-level map of the estimated topical components at equidistant locations of the latent space
CPU Time Computational demand drastically reduced in comparison with existing probabilistic topographic models for discrete data
Application: Predictive modelling and exploratory analysis of dynamic user behaviour from a large web log collection • Using the big mnbc.com web log sequence collection previously used in Cadez et al. • Training on randomly chosen 100,000 user traces, totalling 801,745 page requests • Testing on further, previously unseen 88,181 user trances, totalling 714,280 page requests • Evaluation criteria used: • Generalisation (out of sample log likelihood) • Prediction (out of sample predictive perplexity) – varying sample size issues studied • Visualisation and exploratory analysis
A summary of 100,000 browsing traces: Lists of the most probable sequences at equal locations of the latent space
Model space view Map of state transition components estimated from the browsing sequence data set white=0 black=1
Explanatory user profiles extracted from the same model Prototype view Aspect view
Prototype view Aspect view User Profile 2 User Profile 3
Common behaviour component Different topologies… Grouping-specific behaviour components
Conclusions • Consistent generative probabilistic framework • Discrete latent variables used for inferring state groupings and for inferring influential past states • Continuous latent variables used for representing heterogeneous sequence sets in terms of common patterns • Linear time algorithms obtained • Tested in real applications • Simple structures found behind complex observations • Improved prediction for previously unseen individuals • Efficient compression / low entropy parameters • Interpretable parameters
References • X Wang & A Kabán: A Dynamic Bibliometric Model for the Identification of Online Communities, Submitted to SIAM DM’06. • A Kabán: A Scalable Generative Topographic Mapping for Sparse Data Sequences. Proc International Conference on Information Systems: Coding and Computing (ITCC’05). • M Girolami & A Kabán: Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles. Advances in Neural Information Processing (NIPS’03). (Extended version in Journal of Data Mining and Knowledge Discovery. 10:3, 2005) • A Kabán & X Wang: Context-based Identification of Communities from Internet Chat, Proc. IJCNN’04.