1 / 52

Web Access Pattern Mining

Web Access Pattern Mining. Presented By :. Ferdousi Khanam Chowdhury Msc . 2 nd Semester, Roll no: 887 Session : 2008-09. Association Rule. An Association rule is an implication of the form X->Y, where X and Y are sets of items and X∩Y=ø.

andread
Download Presentation

Web Access Pattern Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Access Pattern Mining

  2. Presented By : FerdousiKhanamChowdhury Msc. 2nd Semester, Roll no: 887 Session : 2008-09

  3. Association Rule An Association rule is an implication of the form X->Y, where X and Y are sets of items and X∩Y=ø Support and Confidence the support of this rule is defined as the percentage of transactions that contain the set XUY, while, it’s confidence is the percentage of these “X” transactions that also contain items in “Y”.

  4. Frequent Itemset In association rule mining , all items with support higher than a specified minimum support are called frequent item. The set of frequent items called frequent itemset.

  5. Sequential Pattern Mining Sequential pattern mining discovers frequent patterns in a sequence database was first introduced by Agrawal and Srikant as follows : “Given a sequence database where each sequence is a list of transactions ordered by transaction time and each transaction consists of a set of items, find all sequential patterns with a user specified minimum support , where the support is the number of data sequences that contain the pattern.”

  6. Sequential Pattern Mining • They have also developed a generalized sequential pattern mining algorithm, “GSP”, which outperforms their AprioriAll algorithm. • GSP mine sequential patterns by scanning the sequence database multiple times.

  7. Sequential Pattern Mining The measures of support and confidence, used in association rule mining for deciding frequent itemsets, are still used in sequential pattern mining to determine frequent sequence and strong rules that can be generated from them.

  8. Web Access Pattern Mining • Web access patterns mined from Web logs are interesting and useful knowledge in practice. • Examples of applications of such knowledge include improving designs of websites, analyzing system performance as well as network communications, understanding user reaction and motivation , and building adaptive websites. • Essentially a web access pattern is a sequential pattern in a large set of pieces of web logs, which is pursued frequently by users.

  9. WAP-tree Mining WAP-tree mining is a non apriori method which stores the web access patterns in a compact prefix tree, called “WAP-tree”

  10. WAP-tree • The WAP-tree stores the weblog data in a prefix tree formal similar to the Frequent pattern tree for non sequential data. • It registers all and only all information needed by the rest of mining. • It scan the access sequence database only twice. • WAP-tree is devised to register access sequences and corresponding counts compactly, so that the tedious support counting can be avoided. • The size of a WAP-tree is usually much smaller than that of the original access sequence database. • It avoids generating large candidate sets.

  11. WAP-tree mine algorithm The philosophy of this mining algorithm is “conditional search” Conditional search narrows the search space by looking for patterns with the same suffix, and count frequent events in the set of prefixes with respect to condition as suffix.

  12. Example Database User ID Web Access Sequence 100 abdac 200 eaebcac 300 babfaec 400 afbacfc

  13. Example Database User ID Web Access Sequence abdac 100 200 eaebcac 300 babfaec 400 afbacfc Frequency of events : a : 1 ; b : 1 ; d : 1 ;

  14. Example Database User ID Web Access Sequence abdac 100 200 eaebcac 300 babfaec 400 afbacfc Frequency of events : a : 1 ; b : 1 ; d : 1 ; c : 1 ;

  15. Example Database User ID Web Access Sequence 100 eaebcac 200 200 300 babfaec 400 afbacfc Frequency of events : a : 2 ; b : 2 ; d : 1 ; c : 2 ; e : 1 ;

  16. Example Database Finally count of frequencies of events are : a : 4 ; b : 4 ; d : 1 ; c : 4 ; e : 2 ; f : 2 ; As minimum support threshold is 75% so every event to be frequent should have frequency count 3. Now here frequent events are : a , b , c

  17. Example Database User ID Frequent Subsequence 100 abac 200 abcac 300 babac 400 abacc

  18. Construction of the WAP Tree Root Frequent Subsequence : abac a:1 a b:1 b a:1 c c:1

  19. Construction of the WAP Tree Root Frequent Subsequence : abcac a:2 a b:2 b a:1 c:1 c a:1 c:1 c:1

  20. Construction of the WAP Tree Root Frequent Subsequence : babac a:2 b:1 a a:1 b:2 b b:1 a:1 c:1 c a:1 a:1 c:1 c:1 c:1

  21. Construction of the WAP Tree Root Frequent Subsequence : abacc a:3 b:1 a a:1 b:3 b b:1 a:2 c:1 c a:1 a:1 c:2 c:1 c:1 c:1

  22. Complete WAP Tree Root a:3 b:1 a a:1 b:3 b b:1 a:2 c:1 c a:1 a:1 c:2 c:1 c:1 c:1

  23. WAP Tree Mining Once the sequential data is stored on the complete WAP Tree, the tree is mined for frequent patterns starting with the lowest frequent event in the header list. Here starting from frequent event “ c ” It first compute the prefix sequence of the base “ c ”

  24. Conditional Sequences on PS|c Conditional sequences base on “ c ” as : aba : 2, ab : 1, abca : 1, ab : -1, baba : 1, abac : 1, aba : -1 After discarding “ c ” Conditional sequences : aba : 2, ab : 1, aba : 1, ab : -1, baba : 1, abac : 1, aba : -1 Count frequencies : a : 2+1+2-1+1+1-1 = 4 b : 2+1+2-1+1+1-1 = 4 c : 1+1 = 2 Here, frequent events are : “ a ” and “ b ”

  25. Conditional WAP-tree|c Root a:3 b:1 a : 4 a:1 b:3 b : 4 b:1 a:3 Frequent Sequential Patterns : ac : 4 bc : 4 a:1

  26. Conditional Sequences on PS|ac Conditional sequences base on “ ac ” as : ab : 3, b : 1, bab : 1, b : -1, Conditional sequences : ab : 3, b : 1, bab : 1, b : -1, Count frequencies : a : 3+1 = 4 b : 3+1+1-1 = 4 Here, frequent events are : “ a ” and “ b ”

  27. Conditional WAP-tree|ac Root a:3 b:1 a : 4 a:1 b:3 b : 4 b:1 Frequent Sequential Patterns : aac : 4 bac : 4

  28. Conditional Sequences on PS|aac Conditional sequences base on “ aac ” as : b : 1 Conditional sequences : “null” Count frequencies : b : 1 No frequent events are here. No conditional tree.

  29. Conditional Sequences on PS|bac Conditional sequences base on “ bac ” as : a : 3, ba : 1 After discarding “ b ” Conditional sequences : a : 3, a : 1, Count frequencies : a : 3+1 = 4 b : 1 Here, frequent event is : “ a ”

  30. Conditional WAP-tree|bac Root a : 4 a:4 Frequent Sequential Patterns : abac : 4

  31. Conditional WAP-tree|abac Root

  32. Conditional Sequences on PS|bc Conditional sequences base on “ bc ” as : a : 3, ba : 1 After discarding “ b ” Conditional sequences : a : 3, a : 1 Count frequencies : a : 3+1 = 4 b : 1 Here, frequent event is : “ a ”

  33. Conditional WAP-tree|bc Root a : 4 a:4 Frequent Sequential Patterns : abc : 4

  34. Conditional WAP-tree|abc Root

  35. Conditional Sequences on PS|b Conditional sequences base on “ b ” as : a : 2, ba : 1 After discarding “ b ” Conditional sequences : a : 2, a : 1 Count frequencies : a : 2+1 = 3 b : 1 Here, frequent event is : “ a ”

  36. Conditional WAP-tree|b Root a : 3 a:3 Frequent Sequential Patterns : ab : 3

  37. Conditional WAP-tree|ab Root

  38. Conditional Sequences on PS|a Conditional sequences base on “ a ” as : ab : 1, abc : 1, b : 1, bab : 1, b : -1 After discarding “ c ” Conditional sequences : ab : 1, ab : 1, b : 1, bab : 1, b : -1 Count frequencies : a : 1+1+1 = 3 b : 1+1+1+1-1 = 3 c : 1 Here, frequent events are : “ a ” and “ b ”

  39. Conditional WAP-tree|a Root a:2 b:1 a : 3 a:1 b:2 b : 3 b:1 Frequent Sequential Patterns : aa : 3 ba : 3

  40. Conditional Sequences on PS|aa Conditional sequences base on “ aa ” as : b : 1 Conditional sequences : “null” Count frequencies : b : 1 No frequent events are here. No conditional tree.

  41. Conditional Sequences on PS|ba Conditional sequences base on “ ac ” as : a : 2, ba : 1 After discarding “ b ” Conditional sequences : a : 2, a : 1 Count frequencies : a : 2+1 = 3 b : 1 Here, frequent event is : “ a ”

  42. Conditional WAP-tree|ba Root a : 3 a:3 Frequent Sequential Patterns : aba : 3

  43. Conditional WAP-tree|aba Root

  44. WAP-tree Mining Discovered frequent pattern set : {a, b, c, ac, bc, ab, aa, ba, aac, bac, abc, aba, abac}

  45. Algorithm 1 : WAP-mine • Input : Access sequence database WAS and support threshold “f” ( 0 < f <= 1) • Output : The complete set of f-patterns in WAS • Method : • Scan WAS once, find all frequent events. • Scan WAS again, construct a WAP-tree over the set of frequent events for using algorithm 2. • Recursively mine the WAP-tree using conditional search.

  46. Algorithm 2 : WAP-tree construction • Input : A web access sequence database WAS and the set of frequent events FE (which is obtained by scanning WAS once). • Output : an WAP-tree “T”. • Method : • Create a root node for “T”. • For each access sequence S in the access sequence database WAS do (a) Extract frequent subsequence S’ from S by removing all events appearing in S but not in FE. Let S’ = s1s2 . . . sn, where si (1<= i <= n ) are events in S’. Let current_node point to the root of “T”. (b) For i=1 to n do, if current_node has a child labeled si, increase the count of si by 1 and make current_node point to si, else create a new child code(si:1), make current_node point to the new node , and insert it into the si-queue. 3. Return(T).

  47. Algorithm 3 : Mining WAP-tree • Input : A WAP-tree “T” and support threshold “f”. • Output : the complete set of frequent patterns • Method : • If the WAP-tree “T” has only one branch, return all the unique combinations of nodes in that branch. • Initialize web access pattern set WAP = “null”. Every event in WAP-tree “T” itself is a web access pattern, insert them into WAP. • For each event ei in WAP-tree “T”, (a) construct a conditional sequence base of ei, i.e. PS|ei by following the ei queue, count conditional frequent events at the same time.

  48. Algorithm 3 : Mining WAP-tree(cont..) (b) If the set of conditional frequent events is not empty ,build a conditional WAP-tree for ei over PS|ei using algorithm 2. Recursively mine the conditional WAP-tree. (c) For each web access pattern returned from mining the conditional WAP-tree , concatenate ei to it and insert it into WAP. 4. Return WAP.

  49. WAP-tree Theorem WAP-mine returns the complete set of access patterns without redundancy.

  50. Drawback of WAP-tree WAP-tree mining algorithm recursively constructs large number of intermediate WAP-trees during mining and this entails storing intermediate patterns, which are still time consuming.

More Related