300 likes | 743 Views
Mining Longest Repeating Subsequences to Predict World Wide Web Surfing Jatin Patel Electrical and Computer Engineering Wayne State University, Detroit, MI 48202 jatin@cs.wayne.edu Introduction Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy.
E N D
Mining Longest Repeating Subsequences to Predict World Wide Web SurfingJatin Patel Electrical and Computer Engineering Wayne State University, Detroit, MI 48202jatin@cs.wayne.edu
Introduction • Predicting user surfing paths involves tradeoffs between model complexity and predictive accuracy. • Aim of this paper is to reduce model complexity while retaining predictive accuracy. • Two Techniques: • (1) Longest Repeating Subsequences (LRS): • Focus on surfing patterns. • Reduce the complexity of the model. • (2) Weighted specificity: • Focus on the longer patterns of past surfing. • Longer surfing are more predictive. • Both techniques are able to dramatically reduce model complexity while retaining a high degree of predictive accuracy.
Surfing Paths • Figure shows the diffusion of surfers through a web site. • (1)Users begin surfing a web site starting from different entry pages. • (2)as they surf the web site, users arrive at specific web pages having traveled different surfing paths. • (3)Users choose to traverse possible paths leading from pages they are currently visiting • (4)after surfing through some number of pages, users stop or go to another web site.
Application of Predictive Models • Search: The ability to accurately predict user surfing patterns could lead to a number of improvements in user WWW interaction. • Google search engine assumes that a model of surfing can lead to improvements in the precision of text-based search engine results. • The distribution of visit over all WWW pages is obtained from this model is used to re-weight and re-rank the results of a text-based search engine.
Latency Reduction • Predictive models have significant potential to reduce user-perceived WWW latencies. • Delays is the Number One Problem in using the WWW. • One solution is to improve prefetching and caching methods. • A Number method including Markov models tells that if system could predict the content a surfer was going to visit than that pages can be prefetched to low latency local storage. • Kroeger, Long and Mogul also tells that improvements in WWW interaction latencies that might be gained by predicting surfer paths. • Latencies are divided into two parts. • (1) Internal latencies caused by computers and networks utilized by the clients and proxies. • (2) External latencies caused by computers and networks between the proxies and external WWW servers.
Predictive Surfing Models • Path Profiles: • Schechter, Krishnan and Smith utilized path and point profiles. • The profile are constructed for the user sessions. • The data are collected for a certain period of time. • This data is used to predict the future web page. • Example: • surfer’s path is <A, B, C> than the best match is <A, B, C, D> than <B, C, E>. • Also important thing is to reduce the model complexity. • To reduce model size, Schechter et al. Used a maximal prefix trie with a minimal threshold requirement for repeating prefixes.
First Order Markov Models • In this method dependency graph contains nodes for all file sever accessed at a particular WWW server. • Dependency arcs between nodes indicated that one file was accessed within some number of accesses w of another file. • The arcs were weighted to reflect access rates. • Latency reduction increases as w increased from w = 2 to w = 4. • Prefetching methods essentially record surfing path transitions and use there data to predict future transitions. • Transition can be recorded any were (i.e. proxy, server etc.) • The important point is that all the methods seemed to improve predictions when they stored longer path dependencies.
Kth- Order Markov Models • Here author evaluates the predictive capabilities of Kth order Markov using ten days of log files collected at the xerox.com web site. • The results of this analysis suggest that storing longer path dependencies would lead to better prediction accuracy. • Surfing path can be represented as n-grams. • N-grams looks like this: <X1, X2,….Xn>, this indicates sequences of page clicks by a population of users visiting a web site. • As we saw earlier, first order Markov model concerned with pate-to-page transition probabilities. That can be estimated from n-grams: • p(x2 | x1) = Pr(X2 = x2 | X1 = x1 )
Kth- Order Markov Models (cont.) • If we want to capture longer surfing paths, we may wish to consider the conditional probability that surfer transitions to an nth page given their pervious k = n-1. • p(xn | xn-1,...xn -k ) = Pr(Xn = xn | Xn-1,...,Xn-k ). • Summary: • Here author uses the data collected from xerox.com, and systematically tested the properties of Kth - order Markov models. • The models were estimated form surfing transitions extracted from training set of WWW server log file data. • And tested against test sets of data that occurred after the training set. • The prediction scenario assumed a surfer was just observed making k page visits.
Kth- Order Markov Models (cont.) • In order to make a prediction of the next page visit the model must have. • 1) An estimate of p(xn|xn-1,…xn-k )from the training data, which required that • 2) A path of k visits <xn-1,…xn-k> had been observed in the training data. • When the path matches between training and test data, the model examined all the conditional probabilities p(xn|xn-1,…xn-k ) available. • Predict the page having the highest probability of occurring. • The important thing to note is that model did not make a prediction when a matching path in the model didn’t exist.
Kth- Order Markov Models (cont.) • Table 1 presents effect of order of Markov model in prefetching. • Pr(Match) the probability that a penultimate path, <xn-1,…xn-k>, observed in the test data was matched by the same penultimate path in the training data. • Pr(Hit|Match) the conditional probability that page xn is visited, given that <xn-1,…xn-k>, is the penultimate path and the highest probability conditional on that path is p(xn|xn-1,…xn-k ). • Pr(Hit) = Pr(Hit|Match)•Pr(Match), the probability that the page visited in the test set is the one estimated from the training as the most likely to occur. • Pr(Miss|Match) the conditional probability that page xn is not visited, given that <xn-1,…xn-k>, is the penultimate path and the highest probability conditional on that path is p(xn|xn-1,…xn-k ).
Kth- Order Markov Models (cont.) • Pr(Miss) = Pr(Miss|Match)•Pr(Match), the probability that the page visited in the test set is not the one estimated from the training as the most likely to occur. • The last metric provides a coarse measure of the benefit-cost ratio. Benefit:Cost = B * Pr(Hit) / C * Pr(miss) • Where B and C vary between 0 and 1 and represent the relative weights associated with the benefits and costs.
Model and Prediction Methods • Producing an accurate predictive model using the least amount of space has many computational benefits as well as practical benefits. • The solution is to remove low information elements for the model. • The LRS Technique treats the problem as a data mining task. • Using LRS, the storage requirement is reduced by saving only information rich paths. • As seen from the kth order Markov model higher-order result in higher predictions rates. • This principle of specificity encourages the use of higher-order path matches whenever possible to maximize hit rates. • The drawback of this approach is that the likelihood of a higher-order path match is quite small resulting in lower overall hit rates.
Longest Repeating Sequences • A longest repeating subsequence is sequence of items where • 1)Subsequence means a set of consecutive items. • 2)Repeated means the item occurs more than some threshold T, where T typically equals one, and • 3)Longest means that although a subsequence may be part of another repeated subsequence, there is at least once occurrence of this subsequence where this is the longest repeating. • Example: • Suppose we have the case where a web site contains the pages A, B, C and D. • As shown in the figure, if user repeatedly visit A to B, but only one user clicks through C and one user clicks through D. (Case 1) so the longest repeating sequences is AB.
Longest Repeating Sequences (cont.) • The complexity of the resulting n-grams is reduced as the low probability transitions are automatically excluded form further analysis. • This reduction happens for transitions that occur only T times, which in some cases will result in a prediction not being make. • In figure 2, case 1. With T = 1, LRS is AB, that means prediction will not be made after the pages A and B have been requested. • So this will result is slightly loss of pattern matching. • Hybrid LRS-Markov Models: • First hybrid LRS model is decompose each LRS pattern into a series of corresponding one-hop n-grams. i.e. LRS ABCD would result in AB, BC, CD one-grams. • The second hybrid LRS model decomposes the extracted LRS subsequences into all possible n-grams.
Longest Repeating Sequences (cont.) • The resulting model is a reduced set of n-grams of various lengths. • The resulting model is called All-Kth-Order LRS model. • The main advantages of this model is that it incorporates the specificity principle of pattern matching by utilizing the increased predictive power contained in longer paths. • Model Complexity reduction: • LRS store those paths that are likely to be needed. • So LRS reduces complexity and space requirement. • The amount of space required for all models LRS or Markov depends on the combinations. • This will vary from site to site. • Also it will change from time to time.
One-hop Markov and LRS Comparison • In order to test whether the hybrid LRS models help achieve the goal of reducing complexity while maintaining predictive power. The test is done on the same test data as before. • For this experiment the one-hop Markov and the one-hop LRS models were built using three training days. • Each model was tested to see if there was matching prefix (Match) for each path and if so, if the correct transition was predicted(Hit). • From this probability of a match Pr(Match), the probability of hit given a match Pr(Hit|Match), the hit rate across all transitions Pr(Hit). • The benefit cost ratio Pr(Hit)/Pr(Miss). • Table 2 displays the results for the one-hop Markov model and the one-hop LRS model.
One-hop Markov and LRS Comparison (cont.) • The one-hop LRS model produces a reduction in the total size required to store the model. • Reducing the complexity as expected. • One might expect that sharp reduction in the model’s complexity would result in an reduction of predictive ability. • But LRS has still good predictive ability. • The important thing here is the reduction in Number of hopes and Model Size in bytes. • 13,189 one-hope for One-hop Markov model, while 4,953 one-hops for One-hop LRS model. • 372,218 bytes for One-hop Markov model, while 136,177 bytes for One-hop LRS model. • Also LRS has almost same hit ratio at One-hop Markov Model.
All-Kth-order Markov approximation and All-Kth-order LRS Comparison • Longer path should be used to get better prediction. • So here we are comparing the results between All Kth order Markov approximation and All Kth order LRS. • Here All Kth order LRS is the subset of All Kth order Markov Model, so we won’t expect better results, but instead we see tradeoffs between complexity reduction and the model’s predictive power. • For this experiment same test data was used. • As seen from the results in Table 3 All Kth order Markov Model consumes 8800 Kbytes, while All Kth order LRS Model 616Kbytes, which is 14 times less than All Kth order Markov Mode. • In terms of hit ratio, we can see from the table that All Kth Order Markov Model have 30% hit ratio while All Kth Order LRS Model have 27% hit ratio.
All-Kth-order Markov approximation and All-Kth-order LRS Comparison (cont.) • Figure 3 Summarizes the results of the two experiments with respect to hit ratio. • As we can see from the figure that All Kth Order Markov Model has the highest hit ratio. • But the Model size is too big. • Also one interesting thing is that one-hop Markov model provides 83% of the predictive power while consuming only 4.2% of the space compared to All Kth Order Markov model. • Also one-hop LRS model provides 80% of the predictive power while using 1.5% of the space.
Parameterization of Prediction Set • Restricting the prediction to only one guess imposes a rather stringent constraint. • One could also predict a larger set of pages that could be surfed to on the next click by a user. • Here we have each model’s performance when returning sets of between one and ten. • Figure shows that All Kth order models perform better. • Also we can see that All Kth Order Markov performs best. • The important thing to note is that the increasing the prediction set has dramatic impact on predictive power. • As we can see from the figure that with the predictive power of each method nearly doubling by increasing the set size to four elements.
Future Work • This paper focused on various Markov models,the concept of LRS can be successfully applied to Markov models in other domains as well as to other suffix-based methods. • In this theory repeating can be defined to be any occurrence threshold T. determining the ideal threshold will depend upon the specific data and the intended application. • The another important thing is the confidence level for each prediction. A modified pattern-matching algorithm could be restricted to only make predictions when a given probability of making a successful prediction was achieved. • Another application of LRS models could be in HTTP server, in which server threads issuing hint lists to clients while maintaining the model in memory.
Conclusion • There is always tradeoffs between model complexity and predictive power. • One-hop LRS model was able to match the performance accuracy of the one-hop Markov model while reducing the complexity by nearly a third. • In order to improve hit rates, All Kth Order LRS model performs well, almost equaling the performance of all Kth Order Markov model while reducing the complexity significantly.