1 / 41

Streaming Predictions of User Behavior in Real-Time

Streaming Predictions of User Behavior in Real-Time. Ethan Dereszynski ( Webtrends ) Eric Butler ( Cedexis ) OSCON 2014. How come you never see a headline like "Psychic Wins Lottery"? Jay Leno. Enabling Interesting Predictions: Leverage Streaming Data. Streams Data. websockets.

gema
Download Presentation

Streaming Predictions of User Behavior in Real-Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Streaming Predictions of User Behavior in Real-Time Ethan Dereszynski (Webtrends) Eric Butler (Cedexis) OSCON 2014

  2. How come you never see a headline like "Psychic Wins Lottery"? Jay Leno

  3. Enabling Interesting Predictions:Leverage Streaming Data

  4. Streams Data websockets

  5. Streams Data 1 second websockets

  6. Streams

  7. The best way to predict the future is to invent it. Alan Kay

  8. Session Data • Each user “click” triggers a event • Event information captured by embedded tag

  9. Session Data • A session is a string of events that all correspond to a single “visit” to a web site. Event 1 Event 2

  10. Session Data • A session end when a visitor leaves the site, closes the browser, or goes idle for 30minutes Event 1 Event 2 Event 3

  11. Learning from Streaming Data • Sessions provide examples of visit behavior • Not all sessions are equally likely • Many paths are rarely, if ever, taken • Frequent paths suggest common ways visitors behave on a given site • Learning Models of Visitor Behavior • Predict future actions • Provides a rich, new feature to identify/segment users • Identify users who have a common trajectory, or subtrajectory, through the web site • More than just a label • Behavior tells us something about howusers achieve a goal on a web site

  12. Event Data • JSON containing parameter/value pairs • Describes content of page (triggered by event) • Contains geo, device, referrer, etc. • 50-100 parameters per page (event)

  13. Challenges of Real Data • How do we describe each event? • Number of parameters per event can be large • Space of possible “events” is massive • Not all parameters are relevant to the user’s actions Client 2 Client 1 Number of events

  14. About Topics Models • Each topic is a distribution over all words in the dictionary • Each document is generated by a mixture of topics D. Blei.   Probabilistic topic models.Communications of the ACM, 55(4):77–84, 2012.

  15. Abstraction Layer: Global/Local Topic – Latent Dirichlet Allocation (GLT-LDA) • Topic modeling technique for document clustering • Documents assigned to a single topic (instead of a mixture) • Global “Noise” topic explains redundant parameters • Clusters parameters into topics Distribution over parameter for topic k Topic distribution Distribution over noise parameters Noise rate for document i jth parameter in event i Topic label for document i Noise-indicator for jth parameter in event i

  16. The Dataset • Collection of visitor traces, varying length Event t Event 1 Event 2 Visitor 2 … … Visitor 1 Visitor n

  17. Representing Behavior: Two Approaches • Enumerate the space of all possible paths and count • This is would require a very big table. • Most of the entries would be 0. • Not clear how to handle variable length visits • Hidden Markov Model (HMM) • Encodes visitor behavior in a probabilistic model • Calculates likelihood (or probability) of specific trajectories • Enables prediction of future actions a visitor may take on the site

  18. The Hidden Markov Model • Site visit (emission) probabilities: • Stochastic state transitions: … Hidden Observed

  19. The Hidden Markov Model • Visitors arrive at a site with an intention • The current intention specifies the probability they will take some action (trigger an event) • After the page is selected, the intention transitions to a new value (could be the same as the previous intention) .4 Product Comparison .6 Make Purchase .7 .3 Viewing Products

  20. The Hidden Markov Model • Visitors arrive at a site with an intention • The current intention specifies the probability they will take some action (trigger an event) • After the page is selected, the intention transitions to a new value (could be the same as the previous intention) Make Purchase .7 .3 .15 .85 Product Comparison Viewing Products .7 .3

  21. Predictive Model: Learning and Runtime • Offline: • Session data is recorded into batch file for training • Trained with expectation maximization (EM) algorithm • Online : • The model used to predict specific visitor actions • CartAdd (add an item to the shopping cart) • Purchase (complete the purchase funnel) • Conditions predictions on observed actions the visitor has taken so far • Update predictions each time a new action is taken by the visitor. • Can be generalized to other predictive queries

  22. Online Inference • Goal: Compute the probability that actions t+1 to t+5 contain at least asingle purchase / cartAdd. state state state state state state act. act. act. act. act. act. t t+1 t+2 t+3 t+4 t+5

  23. Online Inference • Goal: Compute the probability that actions t+1 to t+5 contain at least a single purchase / cartAdd. state state state state state state act. act. act. act. act. act. t t+1 t+2 t+3 t+4 t+5 Prediction window

  24. Streams Data websockets

  25. Prediction Architecture: Augments events with prediction values and confidence labels Validation Bolt Prediction Bolt Validates raw events from Kafka

  26. Prediction Architecture: websockets Event Stream Bolt Session Stream Bolt Dispatches individual events to Streams Dispatches full sessions to Streams Augments events with prediction values and confidence labels Validation Bolt Prediction Bolt Validates raw events from Kafka

  27. Prediction Architecture: websockets Event Stream Bolt Session Stream Bolt Dispatches individual events to Streams Dispatches full sessions to Streams Completed sessions are used to scored predictive model’s accuracy Model receives new thresholds for confidence labels Augments events with prediction values and confidence labels Validation Bolt Prediction Bolt ROC Bolt Validates raw events from Kafka

  28. Streams Demo

  29. Results

  30. Next Steps • Integrating visitor information across multiple visits • Automated re-training of predictive model • Adjust to seasonal and trend effects • Generative models for Anomaly Detection • What does a Likely/Unlikely session look like? • Richer models of visitor behavior • Hierarchical models for behavior

  31. Questions? Thank you! Ethan.Dereszynski@webtrends.com elbpdx@gmail.com

More Related