1 / 26

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering. OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage. Ma'ayan Gafny , Asaf Shabtai , Lior Rokach , Yuval Elovici. Definitions. Definitions.

amber
Download Presentation

OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ben-Gurion University of The Negev Faculty of Engineering Sciences Department of Information Systems Engineering OCCT: A One-Class Clustering Tree for Implementing One-to-Many Data Linkage Ma'ayanGafny, AsafShabtai, LiorRokach, Yuval Elovici

  2. Definitions

  3. Definitions

  4. Definitions TA: TB: r(a) r(b) A = {a1,a2,a3,…,an} |A| = n |TA| = num of records in TA r(a) = a record from TA B={b1,b2,b3,…,bm} |B|=m |TB| = num of records in TB r(b) = a record from TB

  5. Definitions TA: TB: r=(r(a) , r(b)) TA x TB :

  6. Definitions TA x TB : TAB TAB

  7. Definitions TA x TB : TAB TAB

  8. Definitions

  9. Definitions Ad⊆A– the subset of attributes of TA that were already selected as splitting attributes in the path from the root of the tree to node d. Ad4 = {a1,a2} Ad2 = {a1}

  10. Running Examples

  11. The data set

  12. The data set – cont.

  13. Coarse Grained Jaccard

  14. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location • Request day of week • Request part of day

  15. CGJ– Splitting the root of the tree * W1 = 16/31 Score1=1/23 + reqLocation = Bonn reqLocation = Berlin reqLocation = Hamburg d d d Score(SplitreqLocation) = 0.0561 • reqLocation !=Hamburg • reqLocation != Berlin • reqLocation != Bonn Score2=2/23 W2 = 9/31 * + Score3=1/23 W3 = 6/31 *

  16. CGJ– Splitting the root of the tree * W1 = 7/31 Score1=3/15 + * Score2=5/15 W2 = 5/31 dayOfWeek= Wednesday dayOfWeek = Friday dayOfWeek = Thursday dayOfWeek = Friday dayOfWeek= Monday d d d d d + • dayOfWeek!= Wednesday • dayOfWeek!= Thursday • dayOfWeek!= Friday • dayOfWeek!= Friday • dayOfWeek!= Monday Score(SplitdayOfWeek) = 0.260 * Score3=3/15 W3 = 3/31 + * Score4=5/15 W4 = 9/31 + * Score5=3/15 W5= 7/31

  17. CGJ– Splitting the root of the tree Score1=4/23 partOfDay= Morning d partOfDay= Afternoon Score(SplitpartOfDay) = 0.173

  18. Coarse Grained Jaccard – Splitting the root of the tree Three candidates for split: • Request location 0.0561 • Request day of week 0.260 • Request part of day 0.173 The split in the root

  19. Fine Grained Jaccard

  20. Fine Grained Jaccard – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  21. Least Probable Intersection

  22. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  23. Req. Location = Berlin Req. Location != Berlin

  24. LPI – Splitting the root of the tree Req. Location = Berlin d Req. Location != Berlin

  25. Maximum Likelihood Estimation

  26. MLE – Splitting the root of the tree Cust. City Cust. Type Cust. City Cust. Type Cust. City Cust. Type p(Cust. City|Cust. Type) p(Cust. Type|Cust. City)

More Related