A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier

Outline • Introduction • Background Work • Mobility Prediction Based On Mobility Rules • Experimental Results • Conclusion • Future Work

Introduction • Personal Communication Systems are becoming more popular • Dynamic relocation of users gives rise to the problem of Mobility Management • Methods for storing and updating the location information of users • Mobility Prediction: the prediction of a user’s next inter-cell movement

Motivation • Predicted movement can be used for effectively allocating resources instead of blindly allocating excessive resources • Benefit to the broadcast program generation [1], data items can be broadcast to the predicted cell • Location prediction is crucial in processing of location dependent queries [2], since answer depends on the location of user • Queries depending on future positions can be answered by effective location prediction [1] Y. Saygin and O. Ulusoy. Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments. IEEE Transactions on Knowledge and Data Engineering, 14(6): 1387-1399, 2002. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995. [2] G. Gok and O. Ulusoy. Transmission of Continuous Query Results in Mobile Computing Systems.Information Sciences, 125(1-4): 37-63, 2000

Network Model • PCS network partitioned into smaller areas called cells • Each cell has a Base Station (BS), used for broadcasting and receiving information • Home Location Register (HLR): database which keeps the inter-cell movement history of user • Visitor Location Register (VLR): each BS has a database which keeps the profiles of the users located in this cell.

Problem Definition • It is possible for us to get the movement history of a mobile user from HLR of a user • Movement trajectories in the form of T=<(id1, t1) ... (idk, tk)> • Partitioned into subsequences, named user actual paths, UAPs • UAPs have the form of U=<c1, c2, ..., cn> • We mine UAPs to find user mobility patterns, UMPs

Related Work • The roots of our method go back to the Apriori algorithm [3] • Association rule mining • Sequential pattern mining problem [4] • Ordering of the items in an itemset must be taken into consideration • Not appropriate for our domain, because does not take into account the network topology [3] R. Agrawal, R. Srikant, Fast Algorithms for mining association rules. In Proceedings of Very Large Databases Conference (VLDB’94), pages 487-499, 1994. [4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95), pages 3–14, 1995.

Mobility Prediction Based On Mobility Rules • Mining UMPs from Graph Traversals: Movement data mined for discovering regularities (UMP) in inter-cell movements • Generation of Mobility Rules: Mobility rules are extracted from UMPs • Mobility Prediction: Prediction of next inter-cell movement based on mobility rules

Mining UMPs from Graph Traversals An example coverage region and corresponding graph G • Vertices of G: the cells in the coverage region • Edges of G: if two cells, A and B, are neighbors in the coverage region, then there are two edges in G, A  B and B  A

Mining UMPs from Graph Traversals • Subsequencedefinition: Assume we have two UAPs, A = <a1, a2, ... , an> and B = <b1, b2, ... , bm>. B is a subsequence of A, iff all cells in B also exist in A while keeping their order in B • Example: A=<c3, c4, c0, c1, c6, c5>, then B=<c4, c5> is a length-2 subsequence of A. In other words, B is contained by A

Mining UMPs from Graph Traversals • Every candidate has a count value that keeps the support given to this candidate by UAPs • This is the point our work extends algorithm in [5, 6] • Method in [5, 6] increments the count value of a candidate by 1 if this candidate is contained by a UAP • Unfair !!! • Treats in the same way • a highly corrupted candidate pattern • a slightly corrupted (or even not corrupted at all) candidate pattern [5] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Transactions on Knowledge and Data Engineering, 15(5): 1155-1169, 2003. [6] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, Effective Prediction of Web User Accesses: A Data Mining Approach, In Proceedings of the WebKDD Workshop (WebKDD’01), 2001.

Mining UMPs from Graph Traversals • Should consider the degree of corruption for the mobile motion prediction context • Support assigned to a candidate pattern B by a UAP A (i.e., suppInc)

Mining UMPs from Graph Traversals • Define totDistvalue by means of the notion of string alignment • Definition 2.1: If x and y are each single character or space, then (x, y) denotes the score of aligning x and y. In our case, the scoring function is defined as follows:

Mining UMPs from Graph Traversals • Definition 2.3: Let A be a UAP and B be a pattern. A containment alignment X' maps A and B into strings A‘ and B‘ where: • |A'| = |B'| • B is contained by A, and • Removal of all spaces from A' and B' leaves A and B • Total score of the alignment X':

Mining UMPs from Graph Traversals • For any two patterns, there may be more than one alignment • Ex: Consider A=<c3, c4, c0, c1, c6, c5, c8, c5>, B=<c4, c5>

Mining UMPs from Graph Traversals • Definition 2.4: An optimal containment alignment of UAP A and pattern B is one that has the minimum possible value for these two patterns • Total score of an alignment: sum of penalties • An optimal alignment should have the minimum number of mismatches, which means the minimum score of alignment • totDist(A, B) = Score of the optimal alignment for the UAP A and pattern B

Mining UMPs from Graph Traversals • Example: Given UAP A=<c3, c4, c0, c1, c6, c5, c8> and pattern B=<c4, c5 , c8 > , optimal containment alignment for these: • Score of the alignment = totDist (A, B) = 3 • Support assigned to the candidate pattern B by the UAP A:

Mining UMPs from Graph Traversals • The quality of the patterns will improve since this method is a more accurate way of support counting • Degree of corruption taken into account • This will give rise to more accurate mobility rules • Resulting in the prediction accuracy improved compared to the accuracy by using the rules that are generated with the former way of support counting • Application of different methods for totDist will affect the quality of rules

Mining UMPs from Graph Traversals • Candidate Generation: • Example: C = <c1, c2, ..., ck> • N+(ck): the set of all nodes in G, which have an incoming edge from the cell ck • A cell from N+(ck) is attached to the end of C to generate C' • Add C' to the set of Candidates

Mining UMPs from Graph Traversals • Apriori Pruning can be used? • NO due to the nature of our new support counting method • Support is no longer monotonically decreasing with the increasing size of the pattern • A length-(k-1) subpattern S of a length-k pattern P doesn’t need to be large even if P is large • Ex: UAP <1, 6, 0, 3, 2>, P1= <1, 0, 2> and its subpattern P2= <1, 2> • UAP assigns a support • to P1 and to P2

Mining UMPs from Graph Traversals Example: Use suppmin= 1.33 UMP Mining Algorithm Database of UAPs Set of all large Patterns (UMPs)

Tail Head Generation of Mobility Rules • Extract rules from the UMPs • For a rule: R: < c1, c2, …, ci-1 >  < ci, ci+1, ... ck > • A confidence value is calculated:

Generation of Mobility Rules • The rules which have confidence higher than confmin are selected • All possible mobility rules for the UMPs given in previous example are:

Mobility Prediction • User has followed a path P=< c1, c2, …, ci-1 > up to now • Find the rules whose head parts are contained in P and the last cell in their head is ci-1 • Store the first cell of tail along with the (confidence + support) of rule as a tuple • Sort these tuples w.r.t. the (confidence + support) values in descending order • Select the first m tuples

Mobility Prediction • Example: Assume that the current trajectory of the user is P=<2, 3, 0, 4> • Matching Rules: • <4>  <0> • <4>  < 5> • <3, 4>  <0> • < 3, 4 >  <5> • Sorted tuple array is: TupleArray= [(5, 85.83), (0, 76.5)] • If m=1, then Predicted Cells Set = {5} • If m=2, then Predicted Cells Set = {5, 0}

Simulation Design • Mobile users travel on a 15 by 15 hexagonal shaped network • To generate UAPs, first UMPs are generated • UMPs are taken as a random walk over the network • Two types of UAPs: • Outliers: a random walk over the network • Non-outliers: those which follow a UMP • o (outlier percentage): ratio of the number of outliers to the number of non-outliers

Simulation Design • Corruption mechanism: insert random cells between the consecutive cells of an UMP • c (corruption ratio): denotes the ratio of the number of such random cells to the number of cells in the corresponding UMP • Three possible outcomes of a prediction • Correct prediction • Incorrect prediction • No prediction • Two performance measures:

Algorithms Used for Comparison • Mobility Prediction Based on Transition Matrix (TM) • A cell-to-cell transition matrix formed • Select the m most probable cells from the transition matrix • Ignorant Prediction • Randomly select the m neighboring cells of the current cell

Impact of m on Precision and Recall • Decreasing precision for both our algorithm and TM • Increasing probability of making some incorrect predictions as m increases • Increasing recall for all algorithms, but more significant increase for TM and Ignorant prediction

Impact of m on Precision and Recall • Setting m as small as possible is convenient for our method • The increase rate in the recall value from m values 1 to 2 is maximum for TM • m ≥ 3 would cause excessive network resource waste • Thus choose m = 2

Impact of Suppmin • Reduced recall and precision • The increase in the suppmin value leads to a decrease in the number of mined mobility rules • Number of correct predictions is reduced • Choose suppmin=0.1

Impact of Confmin • Increasing precision • Higher quality rules with the increasing confmin • Leading to a higher decrease rate in number of predictions when compared to the decrease rate in number of correct predictions • Decreasing recall • The number of mined rules is reduced leading to a decrease in the number of correct predictions • Choose confmin=80

Impact of Corruption Factor • Decreasing precision and recall for our method and TM • For all c, better precision than TM but worse recall than TM • For our method, as c increases: • The number of mined mobility rules decreases • No prediction in some cases because no matching rules due to the corrupted UAPs

Impact of Outlier Percentage • Both performance measures not affected significantly for all methods • Rules extracted from outlier UAPs not used commonly, thus not reducing recall and precision significantly

Conclusion • A data mining algorithm for the prediction of user movements in a mobile computing system • Algorithm is based on • Mining the mobility patterns of users • Then forming mobility rules from these patterns • Finally predicting a mobile user’s next movements by using the mobility rules • A good performance when compared to the performance of Ignorant Method

Conclusion • Performance when compared to the TM • Better Precision: • More accurate predictions • Most of its predictions made at each request are correct • Worse Recall: • Our method may not make prediction in response to some of the prediction requests • Because there may not be any matching rule for the current trajectory of the user when a prediction request is made

Future Work • For calculating the totDist value, our method: • Decrease the support given to pattern by a UAP as the number of corrupted cells increases in pattern • Other methods may be employed for calculating totDist value • No time domain of the mobility patterns and mobility rules considered • In real life, mobility patterns might be related to time • Some specific rules valid for a specific time interval • Extend our algorithm to include the time domain of mobility rules • A candidate pruning criterion suitable for our support counting method may be employed

? Questions & Comments

A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS*