Association Rule

Association Rule • 컴퓨터 =>게임소프트웨어 구입 [support =2%, confidence=60%] • A=>B • Support (A=>B ): P[A and B] • Confidence (A=>B ): P[B|A] • % 로 표시 • Itemset: a set of items • 2-itemset: {컴퓨터, 게임소프트웨어} • 최소 support, 최소 confidence • Frequent item set: itemset 이 최소 support 를 만족하는 경우 • 대형 DB 에서 association rule mining. • 모든 frequent item set 을 찾는다. • 이 중에서 strong association rule (최소confidence 도 만족) 을 만족하는 rule 을 찾는다. Data Mining: 학부강의

1차원 Boolean 연관규칙 마이닝 :transactional databases Min. support 50% Min. confidence 50% 규칙 AC: support = support({AC}) = 50% confidence = support({AC})/support({A}) = 66.6% Apriori principle: Any subset of a frequent itemset must be frequent Data Mining: 학부강의

Frequent Itemsets 마이닝의단계 • frequent itemsets 을 찾는다. • frequent itemset 의 부분집합은 frequent itemset • 예: {AB} 가 frequent itemset => {A}와 {B} 는frequent itemset • A=>B 일 때, !B =>!A • Cardinality (itemset 을 구성하는 item의 수) 가 1에서 k 까지 frequent itemsets 을 찾는다. • frequent itemsets 으로부터 association rules을 작성 Data Mining: 학부강의

The Apriori Algorithm • Join Step: Lk-1 (frequent k-item set)과 을 Lk-1을join 하여 Ck(k-item candidate set)을 생성 • Prune Step: frequent 하지 않은 (k-1)-itemset 은 frequent k- itemset 의 부분집합이 될 수 없다 • Pseudo-code: Ck: Candidate k-itemset Lk : frequent k-itemset L1 = {frequent items}; for(k = 1; Lk !=; k++) do begin Ck+1 = Lk 에서 생성된 candidates for each transaction t in database do t 에 속해있는 candidate 는 Ck+1에서 count를 증가 Lk+1 = Ck+1 의 candiate 중에서 min_support 를 갖는 것들 end returnkLk; Data Mining: 학부강의

예: Apriori Algorithm Database D L1 C1 Scan D C2 C2 L2 Scan D L3 C3 Scan D Data Mining: 학부강의

Candidate 의 생성방법 • Lk-1의 item 들이 (사전식으로) 순서배열되어있다고 가정 • Step 1: self-joining Lk-1 Insert into Ck select p.item1, p.item2, …, p.itemk-1, q.itemk-1 from Lk-1의 p와Lk-1 의q where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1 (예: L2 에서 ,p.item1 =2, p.item2 =3, q.item2 =5 =>{2,3,5}) • Step 2: pruning forall itemsets c in Ckdo forall (k-1)-subsets s of c do if (s is not in Lk-1) then delete c from Ck Data Mining: 학부강의

예: Candidate 생성 • L3={abc, abd, acd, ace, bcd} • abxy 이면, abx 와 aby 가 속한다. • Self-joining: L3*L3 • abc와 abd => abcd • acd와 ace => acde • Pruning: • ade가 L3 에 속해있지 않다=> acde를 제거 • C4={abcd} Data Mining: 학부강의

Association Rule