AI on the Web, Part I. CS592 Class Spring 2000. Part I: Personalized Prediction.

  2. Part I: Personalized Prediction

  3. NewsDude @ UCI[*] [*] D.Billsus, M.Pazzani. A Hybrid User Model for News Story Classification. Proc. In UM99, Banff, Canada, June99. • Intelligent agent compiles a daily news program for individual user (info retrieval) • Architecture: How it works? • Short-term vs. Long-term models for user modeling • Time-coded feedback to increase prediction accuracy

  4. NewsDude Architecture

  5. Learning Models • short-term model (NN) • news threads for ongoing recent events • long-term model (Naïve Bayes Classifier) • general news preferences • hybrid • Use short-term model • Use long-term model • Assign default score

  6. Time-coded feedback • Use the amount of time a user has listened to a story as implicit feedback • User’s direct binary feedback + Time-coded feedback = Fine-grained scale w/out extra burden on user (similar to Lieberman’s Letizia) • pl = proportion of a story user has heard • If story was rated as uninteresting: score = 0.3 * pl • If story was rated as interesting: score = 0.7 + 0.3*pl • If user asked for more information: score = 1.0

  7. NewsDude evaluation

  8. NewsDude: Strengths and Limitations • Tracks user’s changing interests in real-time without sacrificing general interests • Simple feedback but accurate prediction • Rate enough before personalizing • Not flexible: recalculate classifier if adding new keywords • Similar systems: GroupLen (U.Minnesota)

  9. Adaptive Web Sites(Mike Perkowitz and Oren Etzioni, IJCAI97) • Web sites that improve their organization and presentation by learning user access patterns.

  10. Issues in Adaptive Web Sites Design • Observation -- Discover User’s Interests • Special user’s interests • Group users’ interests • All users’ interests • Adjustment -- Adjust the original web-page design according to the observation • Customization (by link, by keyword reordering) • Optimization (by refining search results)

  11. Adaptive Web Sites: Two Examples • WebWatcher • Modify the original design by promotion, demotion and highlighting of links, and by linking web pages • Constructed by Carnegie Mellon University • http://www.cs.cmu.edu/~webwatcher/ • PageGather • Generate new web pages: Index Page Synthesis • Constructed by University of Washington, Seattle • http://www.cs.washington.edu/research/adaptive/

  12. WebWatcher(T. Joachims, D. Freitag, T. Mitchell, IJCAI97) • A software agent acts as a tour guide for web visitor; • Making suggestions on where to go next; • Learning from information provided when users enter and exit the web page, and also learns user’s access patterns. • Reorganizes web pages for user

  13. WebWatcher: architecture • A User • Requests • WebWatcher commands • Highlight advice • Replaced URLs • WebWatcher • World Wide Web • WebWatcher: a proxy agent between users and WWW

  14. Learning from Previous Tours • Users provide keywords of interest before the tour starts; • Those key words are added to the descriptions of every hyperlink this user follows; • Interests and hyperlink descriptions are represented by high-dimensional feature vectors. Their elements are calculated by using TF-IDF heuristic; • LinkQuality = Evaluation of probability that a user follow this hyperlink • estimated as the average similarity of k highest ranked keywords associated with the hyperlink.

  15. WebWatcher: TFIDF • WebWatcher uses the TFIDF with cosine similarity measure to calculate the current user’s similarity to hyperlink description • TFIDF calculates feature vector V as follows: • Vi = Freq(Word i) * [ log2(n) - log2(DocFreq(Word i)) ] • Freq(Word i) : the number of occurrences of Word i in this page • DocFreq(Word i) : the number of pages Word i appears • n is the total number of pages

  16. WebWatcher: Reinforcement Learning • Reinforcement learning: learn control strategies that select optimal actions • R(s): reward function at state s • Q(s, a): the goodness of action a in state s • Q(s,a) = R(s) +  * max { Q(s’, a’) | a’} • s is the current state, s’ is the next state through a, •  (0   < 1 ): a discount factor that determines how severely to discount the value of rewards received further into the future.

  17. Reinforcement Learning in WebWatcher • States correspond to Web pages • Actions correspond to Hyperlinks • R keyword (s): the TFIDF value of the keyword for page s • Q keyword (s,a) will be learned as the sum of discounted TFIDF value of keyword over the optimal tour beginning with a. • For every word w, WebWatcher uses a separate reward function R w(s) and learns a distinct Q w(s,a).

  18. WebWatcher: An Example of state space • 0.9 • 0.81 • 1 • 0.73 • S • 0.9 • R=1 • 0.73 • 0.81 • Initially, R =0 except destination • R = 1 at destination web page •  = 0.9

  19. PageGather(Mike Perkowitz and Oren Etzioni, IJCAI97, 99) • Index Page Synthesis • Instead of modifying the original web page design, PageGather create new index pages that contain collections of links related but currently unlinked pages. • Based on cluster mining to find collections of related pages.

  20. Cluster Mining: co-occurrence frequencies • For each pair of pages P1 and P2, compute: • Pr(P1|P2) the probability of visiting P1 if P2 is visited • Co-occurrence frequency between P1and P2 is the minimum of Pr(P1|P2) and Pr(P2|P1) • Co-occurrence frequency is zero if these two pages are already linked. • Compute a Similarity matrix • Apply a threshold and set low similarities to zero

  21. PageGather Algorithm • Process the access log into visit data. • Compute the co-occurrence frequencies between pages and create a similarity matrix. • Create the graph corresponding to the matrix, and find cliques (or connected components) in the graph. • For each cluster found, create a web page consisting of links to the documents in the cluster.

  22. Next Web Document Prediction • Papers by Albrecht, Zukerman and Nicholson • “Predicting User’s Requests on the WWW”, UM99 • “Pre-sending Documents on the WWW”, IJCAI99 • Theme: use Markov Models to predict the next document requested, and pre-send it

  23. Prediction Models • Prediction models are of the form • P(DR1, TR1 | previous requests) • Assumptions • distribution of the time for requesting a document is independent of the actual document • the next document depends only on the previous document • the time of the next request depends only on the time of the last request

  24. Prediction Models (Cont...) • From these assumptions, we can derive • P(DR1, TR1 | previous requests) = P(DR1 | previous documents) x P(TR1|TR) • Need to estimate the value of each of the two terms in the above equation

  25. Request Time Prediction

  26. Document Prediction • Four models are used for prediction • Time Markov Model • Space Markov Model • Second-order Time Markov Model • Linked Space-Time Markov Model • Graphical representations are used to represent each document prediction model

  27. Document Prediction (Cont…) • If a document Di is request after an event Ei-1, then there is an arc between them • For the Time Markov Model, Ei-1 is the last document reuqest (Di-1) • For the Space Markov Model, Ei-1 is the referring document of Di

  28. Document Prediction (Cont…) • For the Second-order Time Markov Model, Ei-1 is a tuple which contains the last two documents requested • For the Linked Space-Time Model, Ei-1 is a tuple that contains the last document requested and its referer

  29. Document Prediction (Cont…) • Each arc from event Ei-1 to Di has an associated weight w(Ei-1,Di) which is the frequency of an event-document pair across all training sessions • The probability of the request is then

  30. Document Prediction (Cont…)

  31. Hybrid Prediction Models • MaxHybrid Model • Consults all the Markov prediction models and selects the one with the highest probability in its most likely prediction • OrderedHybrid Model • Orders the Markov models according to their performance: Linked, Second-order, Time, and Space. Selects the first one that can make a prediction

  32. Hybrid Prediction Models(Cont…) • SpaceLinkedHybrid Model • If the maximum prediction made by the Space Markov Model is > 0.77, then use its prediction. Otherwise, use those of the Linked Markov Model

  33. Results • The experimental data is 50 days of server log in the form of {client,referer,requestedDoc,time,size} • Prediction modesl were assessed in terms of the probability with which they predict the actual next request

  34. Results (Cont…)

  35. Pre-sending documents • IJCAI 99 Paper (same data set) • including two costs • cost of waiting for a documents (cost-per-second) • cost of transmitting a document (cost-per-byte) • Calculate the expected benefit using document probabilities: • Expected-Benefit = Expected-Wait-Reduction - Expected-Total-Cost • Result: Pre-sending with an 8-hr cache best!

