IT444: Web Intelligence

IT444: Web Intelligence Revision Apriori and HITS algorithm

Association Rules Apriori Algorithm

Pass 1 • Generate the candidate itemsets in C1 • Save the frequent itemsets in L1 Pass k • Generate the candidate itemsets in Ck from the frequent itemsets in Lk-1 • Join Lk-1p with Lk-1q, as follows: insert intoCkselectp.item1, p.item2, . . . , p.itemk-1, q.itemk-1fromLk-1p, Lk-1q wherep.item1 = q.item1, . . . p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1 • Generate all (k-1)-subsets from the candidate itemsets in Ck • Prune all candidate itemsets from Ck where some (k-1)-subset of the candidate itemset is not in the frequent itemsetLk-1 • Scan the transaction database to determine the support for each candidate itemset in Ck • Save the frequent itemsets in Lk

Example • Assume the user-specified minimum support is 40%, then generate all frequent itemsets. • Given: The transaction database shown below:

Pass-1 C1 L1

Pass-2 C2 C2 Before computing support, check for pruning. Nothing pruned since all subsets of these itemsets are frequent

C2 L2 After saving only the frequent itemsets

Pass-3 C3 • To create C3 only look at items that have the same first item (in pass k, the first k - 2 items must match)

Pruning (k-1)-subset of the candidate itemset is not in the frequent itemsetLk-1 In pass-3: • Find all subsets of 2 items from the C3, and check if they are in the frequent itemset L2.

C3 after pruning Pruning eliminates ABE since BE is not frequent

Scan transactions in the database and compute support L3

Pass-4 • First k - 2 = 2 items must match in pass k = 4

Pruning • Pruning: For ABCD we check whether ABC, ABD, ACD, BCD are frequent. They are in all cases, so we do not prune ABCD. • For ACDE we check whether ACD, ACE, ADE, CDE are frequent. Yes, in all cases, so we do not prune ACDE • Both are frequent L4

Pass-5 • For pass 5 we can't form any candidates because there aren't two frequent 4-itemsets beginning with the same 3 items.

Association Rules • {A, B, C} • Non-empty sets: • {A}{B}{C} {AB}{AC} {BC} • Assume min confidence 70% • Compute confidence for each rule

Rules • R1: A, BC • Confidence= support {A B C}/support {A B} = 0.6/ 0.6= 1 => 100% Compute confidence in R2 R2: A, CB

HITS algorithm

Example-1 • Apply the HITS algorithm on the following web graph: 1 2 3

Initialize HUB and AUTH values HUB=1 AUTH=1 HUB=1 AUTH=1 1 2 HUB=1 AUTH=1 3

Normalization Normalized HUB (1)= HUB(1)/ SQRT [HUB(1)2+HUB(2)2+HUB(3)2] Normalized AUTH (1)= AUTH(1)/ SQRT [AUTH(1)2+AUTH(2)2+AUTH(3)2] We do this for all pages in the graph.

Normalized values • HUB (1)=0.58, AUTH (1)=0.58 • HUB (2)=0.58, AUTH (2)=0.58 • HUB (3)=0.58, AUTH (3)=0.58

Compute new HUB and AUTH valuesNode (1) • HUB (1)= AUTH(2)+AUTH(3)= = 0.58 + 0.58 = 1.16 • AUTH (1)= =0 Authority of nodes pointed to by node (1) Hub value of nodes pointing to node (1)

Node (2) • HUB (2)= =0 • AUTH (2)= = HUB (1)= 0.58 Authority of nodes pointed to by node (2) Hub value of nodes pointing to node (2)

Node (3) • HUB (3)= =0 • AUTH (3)= = HUB (1)= 0.58 Authority of nodes pointed to by node (3) Hub value of nodes pointing to node (3)

After Normalization • HUB (1)= 1.16/SQRT [(1.16)2+02+02] =1.16/SQRT (1.3456) =1.16/1.16=1 • AUTH (1)= 0 • HUB(2)=0, AUTH(2)=0.71 • HUB(3)=0, AUTH(3)=0.71

Recalculating HUB and AUTH • HUB (1)= AUTH(2)+AUTH(3)= = 0.71 + 0.71 = 1.42 • AUTH (1)= 0 Normalizing: Hub(1)= 1.42/SQRT [(1.42)2+02+02] HUB(1)=1.42/SQRT(2.0164) = 1.42/1.42= 1

Recalculations • HUB (2)= 0 • AUTH(2)=0.71 • HUB (3)=0 • AUTH(3)=0.71 • Because the values are unchanged, we stop here. • Page 1 is clearly the hub, and pages 1, and 2 share the honor of being authorities.

IT444: Web Intelligence

IT444: Web Intelligence

Presentation Transcript

Web Intelligence (WI)

The Cooperative Web A Step towards Web Intelligence

Artificial Intelligence Technologies for Web Intelligence

Web Intelligence Text Mining, and web-related Applications

Cybercrime, Cyber Intelligence, Web 2.0

It444 Project

Web Intelligence Complex Networks I

Web Intelligence and Artificial Intelligence in Education

Web Intelligence Complex Networks II

Introducing the Web Intelligence (WIT) Group

WEB Intelligence

BI 4.0 - Web Intelligence 4.0

Web Intelligence

Artificial Intelligence with Web Applications

Web Intelligence

Artificial Intelligence on the Web

Web Intelligence (WI)

The Cooperative Web A Step towards Web Intelligence

Web Intelligence Group

Web Intelligence Group

WEB Intelligence

Artificial Intelligence In Web Development