280 likes | 474 Views
Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering. OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage. Ma'ayan Gafny , Asaf Shabtai , Lior Rokach , Yuval Elovici. Definitions. Definitions.
E N D
Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayanGafny, AsafShabtai, LiorRokach, Yuval Elovici
Definitions TA: TB: r(a) r(b) A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB
Definitions TA: TB: r=(r(a) , r(b)) TA x TB :
Definitions TA x TB : TAB TAB
Definitions TA x TB : TAB TAB
Definitions Ad⊆A– the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. Ad4 = {a1,a2} Ad2 = {a1}
Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location • Request day of week • Request part of day
CGJ– Splitting the root of the tree * W1 = 16/31 Score1=1/23 + reqLocation = Bonn reqLocation = Berlin reqLocation = Hamburg d d d Score(SplitreqLocation) = 0.0561 • reqLocation !=Hamburg • reqLocation != Berlin • reqLocation != Bonn Score2=2/23 W2 = 9/31 * + Score3=1/23 W3 = 6/31 *
CGJ– Splitting the root of the tree * W1 = 7/31 Score1=3/15 + * Score2=5/15 W2 = 5/31 dayOfWeek= Wednesday dayOfWeek = Friday dayOfWeek = Thursday dayOfWeek = Friday dayOfWeek= Monday d d d d d + • dayOfWeek!= Wednesday • dayOfWeek!= Thursday • dayOfWeek!= Friday • dayOfWeek!= Friday • dayOfWeek!= Monday Score(SplitdayOfWeek) = 0.260 * Score3=3/15 W3 = 3/31 + * Score4=5/15 W4 = 9/31 + * Score5=3/15 W5= 7/31
CGJ– Splitting the root of the tree Score1=4/23 partOfDay= Morning d partOfDay= Afternoon Score(SplitpartOfDay) = 0.173
Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location 0.0561 • Request day of week 0.260 • Request part of day 0.173 The split in the root
Fine Grained Jaccard – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin
LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin
Req. Location = Berlin Req. Location != Berlin
LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin
MLE – Splitting the root of the tree Cust. City Cust. Type Cust. City Cust. Type Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)