430 likes | 540 Views
Data Warehouse Mining ( DWM ) For any DataWarehouse with Fact file, F(d 1 ..d n ,m 1 ..m k ) (m i ’s are measurements) and Dimension files, D i (d i , a i1 ...a ir i ) i=1..n.
E N D
Data Warehouse Mining ( DWM ) For any DataWarehouse with Fact file, F(d1..dn,m1..mk) (mi’s are measurements) and Dimension files, Di(di, ai1...airi) i=1..n Method-1: (to simplify) Convert to a Boolean DW by applying a predicate to measurements, {m1…mk} replacing each measurement vector with a 1-bit if predicate is true and 0 if false.(e.g., predicates can be simple thresholds – may include dimensions). Predicated Fact File, PF(d1...dn,m0) (m0 = Boolean predicate result) Dimension files, Di(di, ai1...airi) Next, Theta-join the Dimension files (doing selections and projections 1st ?) using PF as Theta condition, ending up with one large relation,the UniversalPredicated Fact (UF) Universal Predicated Fact File, UF(d1...dn, a11...a1r1…an1...anrn) Next, (possibly) structure UF vertically (e.g., using basic Ptrees or?) Approach? Avoid actually creating the large UF relation at all (very large!). Create UF-basic-Ptrees directly from the Fact and Dimension basic-Ptrees? Method-2: Create the full equi-join of F and all Di (no predication), also denoted result, UF. UF can be fully vertically partitioned and data mined (e.g., Nearest Neighbor Classification, NNC or any other data mining method). Universal Fact File, UF(d1...dn, a11...a1r1…an1...anrn,m1..mk)
product date prod_key (p) prod_name (n) Brand ((b) Supplier (s) date_key (d) Day (a) day_of_wk (w) Month (m) Quarter (q) Year (200y) country countrykey (c) Legalname (l) Continent (o) A UF example Sales Fact Table date_key (d) product_key (p) country_key (c) Total-$-sold meas.(t) UF l us gb us gb us gb us gb us gb us gb us gb us gb us gb d 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 p 0 0 1 1 2 2 0 0 1 1 2 2 0 0 1 1 2 2 c 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 a 4 4 4 4 4 4 9 9 9 9 9 9 2 2 2 2 2 2 w m m m m m m f f f f f f t t t t t t m 2 2 2 2 2 2 7 7 7 7 7 7 6 6 6 6 6 6 q 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 y 2 2 2 2 2 2 1 1 1 1 1 1 3 3 3 3 3 3 n j j i i k k j j i i k k j j i i k k b r r r r u u r r r r u u r r r r u u s 0 0 0 0 2 2 0 0 0 0 2 2 0 0 0 0 2 2 t 4 7 0 1 2 1 3 3 6 5 0 4 6 4 2 5 1 7 o 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 F d 0 0 0 0 0 0 1 1 1 1 1 1 2 2 2 2 2 2 p 0 0 1 1 2 2 0 0 1 1 2 2 0 0 1 1 2 2 c 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 t 4 7 0 1 2 1 3 3 6 5 0 4 6 4 2 5 1 7 Date d 0 1 2 a 4 9 2 w m f t m 2 7 6 q 1 3 3 y 2 1 3 Prod p 0 1 2 n j i k b r r u s 0 0 2 Country c 0 1 l us gb o 0 1
Nearest Neighbor Classification (NNC) Many UF mining research topics can be pursued. E.g., for any DW data area, Association Rule Mining (ARM), Clustering, Classification (e.g., NNC) other NN methods, Iceberg Querying, CaseBased & RoughSet Classification, NNsearch Outlier/Noise Analysis, OLAP operator implementation, Query Processing, Vertical DW maintenance(e.g., upon inserting next-day data...). The research may be quite different depending on the data area. e.g., Dr. Slator is interested in Classification of Virtual Cell data with respect to which students do well Classification: Choose a feature attribute as “class label” (may be composite?) ( = the column(s) you want to classify tuples with respect to). A Classifier is a program with input= unclassified_tuple (no class label yet) and output= predicted_class_label for that input. How is that prediction made? It’s based on already classified tuples (Training Set) of historical data NNC:Given a Training Set, a similarity measure and an unclassified tuple, find a set of nearest neighbors from the Training Set. Those neighbors predict the class thru plurality vote (or similarity-weighted vote). How many neighbors? eg, kNNC, find k nearest neighbors;dNNC all neighbors within a similarity d Note: NNC requires a similaritymeasure on pairs of tuples for nearest to make sense
T( R,G,B, Y) ~100,000 tuples TIFF image Yield Map Training-set, T, consists of an aerial photograph (TIFF image taken during a growing season) and a synchronized yield map (crop yield taken that same year at harvest). NNC example from Precision Agriculture Producer want to classify Y=yield (e.g., Hi,Med,Low) based on color intensity (R,G,B). Y=Yield is the class label attribute. Using last year’s data set for Training Data, producers want a classifier that takes a (R,G,B) triple as input (from an image taken during the current growing season) outputs a predicted Yield for that pixel of their field Then they can apply additional Nitrogen on just those parts of the field that need it to increase yield, without wasting N on the parts that will likely have high enough yield anyway (avoiding application of excess N in those parts, which would just run off into rivers and contaminate ground water anyway). This classifier would help save N costs, maximize yield and save the environment!
UF (predicated) exampleFact file is F(d1, d2, d3, m1, m2, m3). Predicate on mis results in PF(d1,d2,d3,m0)Dimensions D1( d1, a10, a11, a12, a13 ), D2( d1, a20, a21, a22) and D3( d3, a30, a31) D1(d1d2d3 m) 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 =1 =2 =3 d2=0 a22 d2 a20 a21 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 23 2 3 2 3 2 3 a 1 1 0 a 1 1 0 d1 a10 a11 a12 a13 1 2 2 3 3 2 2 3 3 0 1 0 1 0 1 0 1 0 0 0 0 1 1 1 1 d1=0 b 2 3 1 0 0 1 7 c 1 1 a 3 1 0 1 1 1 4 s 1 =1 2 2 3 3 2 2 3 3 0 0 0 0 1 1 1 1 23 2 3 2 3 2 3 2 0 1 7 c 1 =2 3 0 0 2 s =0 =1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 2 2 2 2 3 3 3 3 d3 a30 a31 1 =3 =2 d3=3 0 a 1 0 0 1 1 0 0 1 1 23 2 3 2 3 2 3 b 0 1 2 2 2 2 3 3 3 3 2 a 1 b 0 3 2 2 3 3 2 2 3 3 0 1 0 1 0 1 0 1 2 2 2 2 3 3 3 3 2 2 3 3 2 2 3 3 2 2 2 2 3 3 3 3 23 2 3 2 3 2 3 D1( d1 a10 a11 a121a122a123 a13 ) d10 0 1 1 1 1 c d11 1 1 1 0 0 s d12 0 1 1 1 1 c d13 0 0 0 1 0 s D2( d2 a201a202 a21 a22 ) d10 0 1 1 c d11 0 1 0 a d12 1 1 1 b d13 0 1 0 a D3( d3 a21 a22 ) d10 c 1 d11 a 0 d12 b 1 d13 a 0 NNC example:Choose D2.a22 as Class Attribute, C.
The ordering used on the previous slide is shown here; Generalized Peano order, sorting on d11, then d21, then d31, then d12, then d22, then d32, … (the origin is in the top back left corner) d1 d3 d2
Spread out, so you can see what’s going on. d3 d1 d2
Using the standard orientation (origin in the bottom back left corner) and Generalized Peano order, (x1,y1,z1,x2,y2,z2,x3,y3,z3) Z=d3 Y=d2 X=d1
Z=d3 Y=d2 X=d1 Enlarged, Standard orientation and Generalized Peano order, (x1,y1,z1,x2,y2,z2,x3,y3,z3)
Example UF with a 2-D Reflexive Fact File (a graph) ie, 2-D reflexive relationship on a single dimension file Graph G (as Reflexive 2-D relationship) t1 t2 t3 t4 t5 t6 t7 t1 0 1 1 0 1 1 0 t2 1 0 0 0 0 0 1 t3 1 1 1 0 1 0 0 t4 0 0 0 0 0 0 0 t5 1 0 1 0 1 0 1 t6 1 0 0 0 0 0 0 t7 0 1 0 0 1 0 0 Single Dimension File, R Tid a1 a2 a3 a4 a5 a6 a7 a8 a9 C) t1 1 0 1 0 0 0 1 1 0 1 t2 0 1 1 0 1 1 0 0 0 1 t3 0 1 0 0 1 0 0 0 1 1 t4 1 0 1 1 0 0 1 0 1 1 t5 0 1 0 1 0 0 1 1 0 0 t6 1 0 1 0 1 0 0 0 1 0 t7 0 0 1 1 0 0 1 1 0 0 Tid1 Tid2 e.g., a Protein-Protein interaction graph. Note, the dimension files are identical copies of the gene table Graph G (as Edge Table) G(Tid1 Tid2) t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 Note: Given any 2-D Reflexive Fact File (Graph), the standard Universal Fact File will be denoted as, UF1. UF2 will denote the UF coming from the “2-hop Graph” Fact File (join of G with itself, G2 = ( G Tid1JOINTid’2 G’)[ Tid1, Tid2’]. UF3 will come from the “3-hop Graph” Fact File, G3= G1 Tid1JOINTid2’ G’[ …
For this example: UF = UF1= R THETAJOIN R’(THETAJOIN using THETA=G) UF1 d1 d2 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘ C' t1 t2 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t1 t3 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t1 t5 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t1 t6 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t2 t1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t2 t7 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t3 t1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t3 t2 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t3 t3 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t3 t5 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t5 t1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t5 t3 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t5 t5 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t5 t7 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t6 t1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t7 t2 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t7 t5 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 Recursively, for k > 1 (letting G1=G) Gk =(Gk-1 gkJOINg1’ G’)(g1,…,gk+1) where gk+1 = g2’ UFk= R Gk-join R’ where Gk-join is ThetaJoin using Gk[g1,gk+1]
A UF1 template: 1-bit wherever there are values 0-bit where there are blanks. Note: tij means ti,tj The full matrix (8x8 raster order): UF1 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9'C' t00 t01 t02 t03 t04 t05 t06 t07 t10 t11 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t14 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t17 t20 t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t22 t23 t24 t25 t26 t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t30 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t34 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t36 t37 t40 t41 t42 t43 t44 t45 t46 t47 t50 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t52 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t54 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t56 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t60 t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t62 t63 t64 t65 t66 t67 t70 t71 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t73 t74 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t76 t77 UF1 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9'C‘ t00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t27 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t31 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t33 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t51 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t53 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t55 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t57 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t61 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t72 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t75 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
UF1 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9' C' t00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t34 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t42 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t45 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t46 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t47 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t60 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t62 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t63 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t66 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t67 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t71 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t73 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t76 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t77 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 The full relation, UF1 (in raster order, with padded zeros) Each column is a 0-dim basic Ptree (just sequences, a fanout=0 tree, no compression). Later in these notes, there is discussion of techniques for building the 1-D basic Ptree set and the 2-D basic Ptree set for this Universal Fact File.
G2=(G g2JOINg1‘ G')(g1,g2,g2') G2[g1,g3] t1 t2 t1 t1 t1 t1 t2 t7 t1 t2 t1 t3 t1 t1 t3 t1 t3 t2 t1 t5 t1 t3 t3 t1 t7 t1 t3 t5 t2 t2 t1 t5 t1 t2 t3 t1 t5 t3 t2 t5 t1 t5 t5 t2 t6 t1 t5 t7 t3 t1 t1 t6 t1 t3 t2 t2 t1 t2 t3 t3 t2 t1 t3 t3 t5 t2 t1 t5 t3 t6 t2 t1 t6 t3 t7 t2 t7 t2 t5 t1 t2 t7 t5 t5 t2 t3 t1 t2 t5 t3 t3 t1 t3 t5 t5 t3 t1 t5 t5 t6 t3 t1 t6 t5 t7 t3 t2 t1 t6 t2 t3 t2 t7 t6 t3 t3 t3 t3 t6 t5 t3 t5 t1 t6 t6 t3 t5 t3 t7 t1 t3 t5 t5 t7 t2 t3 t5 t7 t7 t3 t5 t1 t2 t7 t5 t5 t1 t3 t7 t6 t5 t1 t5 t5 t1 t6 t5 t3 t1 t5 t3 t2 t5 t3 t5 t5 t5 t1 t5 t5 t3 t5 t5 t5 t5 t5 t7 t5 t7 t2 t5 t7 t5 t6 t1 t2 t6 t1 t3 t6 t1 t5 t6 t1 t6 t7 t1 t2 t7 t1 t3 t7 t1 t5 t7 t1 t6 t7 t5 t1 t7 t5 t3 t7 t5 t5 t7 t5 t1 UF2 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘C' t11 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 1 0 1 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t17 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 t22 0 1 1 0 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 t23 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 t25 0 1 1 0 1 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 t26 0 1 1 0 1 1 0 0 0 1 1 0 1 0 1 0 0 0 1 0 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t36 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 t37 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t52 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t56 0 1 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t62 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 t63 1 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 1 t65 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 t66 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 t71 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t73 0 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t76 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 UF2
UF3 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘ C' t11 1 0 1 0 0 0 1 1 0 1 1 0 1 0 0 0 1 1 0 1 t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t17 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t22 0 1 1 0 1 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 t23 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 1 t25 0 1 1 0 1 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t36 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 t37 0 1 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t52 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t56 0 1 0 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t62 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 t63 1 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 1 t65 1 0 1 0 1 0 0 0 1 0 0 1 0 1 0 0 1 1 0 0 t67 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 t71 0 0 1 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t73 0 0 1 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t76 0 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 t77 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 UF3 G3=G2 g3JOINg1' G' G2[g1,g3] G(g1 g2) G3[g1,g4] t4 absent, no t1 t1 t1 t2 t1 t1 interaction t1 t2 t1 t3 t1 t2 All other t1 t3 t1 t5 t1 t3 possibilities t1 t5 t1 t6 t1 t5 appear except t1 t7 t2 t1 t1 t6 the 2 below: t2 t2 t2 t7 t1 t7 t2 t3 t3 t1 t2 t1 t2 t5 t3 t2 t2 t2 t2 t6 t3 t3 t2 t3 t3 t1 t3 t5 t2 t5 _t2 t6 absent t3 t2 t5 t1 t2 t7 t3 t3 t5 t3 t3 t1 t3 t5 t5 t5 t3 t2 t3 t6 t5 t7 t3 t3 t3 t7 t6 t1 t3 t5 t5 t1 t7 t2 t3 t6 t5 t2 t7 t5 t3 t7 t5 t3 t5 t1 t5 t5 t5 t2 t5 t6 t5 t3 t5 t7 t5 t5 t6 t2 t5 t6 t6 t3 t5 t7 t6 t5 t6 t1 t6 t6 t6 t2 t7 t1 t6 t3 t7 t2 t6 t5 _t6 t6 absent t7 t3 t6 t7 t7 t5 t7 t1 t7 t6 t7 t2 t7 t3 t7 t5 t7 t6 t7 t7
UF4 G3[g1,g4] G(g1 g2) G4[g1,g5] t4 doesn't appear t1 t1 t1 t2 t1 t1 (no interaction). t1 t2 t1 t3 t1 t2 Every other t1 t3 t1 t5 t1 t3 possibility t1 t5 t1 t6 t1 t5 appears except t1 t7 t2 t1 t1 t6 the 2 below: t2 t2 t2 t7 t1 t7 t2 t3 t3 t1 t2 t1 t2 t5 t3 t2 t2 t2 t2 t6 t3 t3 t2 t3 t3 t1 t3 t5 t2 t5 __t2 t6 not there t3 t2 t5 t1 t2 t7 t3 t3 t5 t3 t3 t1 t3 t5 t5 t5 t3 t2 t3 t6 t5 t7 t3 t3 t3 t7 t6 t1 t3 t5 t5 t1 t7 t2 t3 t6 t5 t2 t7 t5 t3 t7 t5 t3 t5 t1 t5 t5 t5 t2 t5 t6 t5 t3 t5 t7 t5 t5 t6 t2 t5 t6 t6 t3 t5 t7 t6 t5 t6 t1 t6 t6 t6 t2 t7 t1 t6 t3 t7 t2 t6 t5 __t6 t6 not there t7 t3 t6 t7 t7 t5 t7 t1 t7 t6 t7 t2 t7 t3 t7 t5 t7 t6 t7 t7 Note: UF3 = UF4 = UF5 =… UFi for all i>2 since Gi = G3
PF Dimension File, R Tid a1 a2 a3 a4 a5 a6 a7 a8 a9 C) t1 1 0 1 0 0 0 1 1 0 1 t2 0 1 1 0 1 1 0 0 0 1 t3 0 1 0 0 1 0 0 0 1 1 t4 1 0 1 1 0 0 1 0 1 1 t5 0 1 0 1 0 0 1 1 0 0 t6 1 0 1 0 1 0 0 0 1 0 t7 0 0 1 1 0 0 1 1 0 0 t15 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 UF1[a1] t13 t16 t12 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 Replicate R[a1] columns: 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 t61 t16 0 1 0 0 1 0 1 0 t21 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 Replicate R’[a1]=R[a1]tr rows: t31 t51 t61 UF1[a1’ ] From R and F Ptrees, create Ptrees for UF? F (Edge Tbl) t1 t2 1 2 1 3 1 5 1 6 2 1 For UF1[a1] AND with PF 2 7 3 1 3 2 3 3 3 5 5 1 5 3 5 5 5 7 6 1 7 2 For UF1[a1’] AND with PF 7 5
PR[a1] R[a1]replicated 0 0 0 0 0 01 10 10 012 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 PG-pattern 0 0 0011 0011 0011 0011 0011 0011 0011 0011 0001 0010 0101 0011 0100 0001 0011 0 013 0 0 0 0 0 0 0 0 221 0 0 0 0 R[a1] 0 0 0 0 1 0 0 1 0 1 0 0 112 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1100 0001 0010 1100 0001 0001 0001 1100 1100 1100 0100 1100 0001 1100 1100 1100 1100 1100 1100 1100 1100 1100 1100 103 0 0 P R[a1]-replicated 0 0 0 0 Class research project? Develop the algorithm and code for creating the basic Ppattern PR[ai]-replicated Ptrees and (therefore) PUF[ai] Ptrees from PF and R Ptrees.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 RG1[a2] t21 t27 t31 t32 t33 t35 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 Replicate R[a2] as cols of matrix For UF1[a2] AND with pat 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 t53 t55 t57 t51 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 Replicate R[a2]tr as rows of matrix: For UF1[a2’] AND with pat UF1[a2’ ] t12 t13 t15 t32 t33 t35 t53 t55 t72 t75
Note that the cardinality of the UFk file may fill up quickly (wrt k). E.g., in the previous example, for k>2, the cardinality is maximal (34) and almost full (49). Even for k=1, the cardinality is already 17, more than double that of k=0 (7) and 35% of full. If there 100,000 genes involved, e.g., the full size is 10,000,000,000 (10 billion). Instead of joining, one can simply apply quantifiers across the graph. E.g., the quantifying universally across the graph: UFU (a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9') t1 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 t2 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 t3 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 t5 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 t6 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 t7 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 The existential quantifier across the graph yields: UFE (a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9') t1 1 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 1 1 1 t2 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 1 1 0 t3 0 1 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 t5 0 1 0 1 0 0 1 1 0 0 1 1 1 1 1 0 1 1 1 t6 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 t7 0 0 1 1 0 0 1 1 0 0 0 1 1 1 1 1 1 1 0
UF NNC scan example: a1 a2 a3 a4 a5 a6 a7 a8 a9 C d 3NN set so far Final Plurality vote winner: C=0 3 mismatches, d=3, don’t replace 5 mismatches, d=5, don’t replace 3 mismatches, d=3, don’t replace 3 mismatches, d=3, don’t replace 6 mismatches, d=6, don’t replace 6 mismatches, d=6, don’t replace 5 mismatches, d=5, don’t replace 3 mismatches, d=3, don’t replace 6 mismatches, d=6, don’t replace 6 mismatches, d=6, don’t replace 3 mismatches, d=3, don’t replace 5 mismatches, d=5, don’t replace 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 mismatch, d=1, replace 1 mismatch, d=1, replace 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 Choose class label=C in UF1 (Training Set) below Find 3-Nearest Neighbors in UF1. Current practice is to find the 3NN set by scanning. E.g., use Hamming Distance, d(x,y)= # of mismatches to C-classify (a1..a9)= 001100100 0 0 1 1 0 0 1 0 0 t1 t2 1 0 1 0 0 0 1 1 0 1 3 t7 t5 0 0 1 1 0 0 1 1 0 0 1 t1 t3 1 0 1 0 0 0 1 1 0 1 3 t1 t5 1 0 1 0 0 0 1 1 0 1 3 t7 t2 0 0 1 1 0 0 1 1 0 0 1 d1 d2 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘ C' t1 t2 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t1 t3 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t1 t5 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t1 t6 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t2 t1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t2 t7 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t3 t1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t3 t2 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t3 t3 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t3 t5 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t5 t1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t5 t3 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t5 t5 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t5 t7 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t6 t1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t7 t2 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t7 t5 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0
UF NNC scan example-2: a5 a6 C a1’a2’a3’a4’ d 3NN set so far Final winner: C=1 d=2, don’t replace d=4, don’t replace d=3, don’t replace d=4, don’t replace d=3, don’t replace d=3, don’t replace d=2, don’t replace d=2, don’t replace d=2, don’t replace d=3, don’t replace d=2, don’t replace d=2, don’t replace d=2, don’t replace 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d=1, replace 0 0 0 0 0 0 Class label=C’, using Hamming Dis, d(x,y)= # of mismatches: ( a5 a6 a1’ a2’ a3’ a4’ ) = 0 0 0 0 0 0 t1 t2 0 0 1 0 1 1 0 2 t1 t3 0 0 1 0 1 0 0 1 t1 t5 0 0 1 0 1 0 1 2 t5 t3 0 0 0 0 1 0 0 1 d1 d2 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘ C' t1 t2 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t1 t3 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t1 t5 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t1 t6 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t2 t1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t2 t7 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t3 t1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t3 t2 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t3 t3 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t3 t5 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t5 t1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t5 t3 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t5 t5 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t5 t7 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t6 t1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t7 t2 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t7 t5 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0
UF NNC scan example-2 (cont): d=3, don’t include d=2, include it also d=2, include it also d=2, include it also d=2, include it also d=2, include it also d=3, don’t replace d=4, don’t include d=4, don’t include d=2, include it also d=3, don’t include d=3, don’t include d=2, include it also d=2, include it also 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d=2, already have d=1, already have d=1, already have 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 To find all training pts within distance=2 of the sample, takes another scan, using scan methods. 3NN set Vote histogram ( a5 a6 a1’ a2’ a3’ a4’ ) = 0 0 0 0 0 0 a5 a6 C a1’a2’a3’a4’ d t1 t2 0 0 1 0 1 1 0 2 t1 t3 0 0 1 0 1 0 0 1 t5 t3 0 0 0 0 1 0 0 1 0 1 d1 d2 a1 a2 a3 a4 a5 a6 a7 a8 a9 C a1'a2'a3'a4'a5'a6'a7'a8'a9‘ C' t1 t2 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1 t1 t3 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1 t1 t5 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 t1 t6 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0 t2 t1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 t2 t7 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0 t3 t1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1 t3 t2 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1 t3 t3 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1 t3 t5 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0 t5 t1 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 t5 t3 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 t5 t5 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0 t5 t7 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 t6 t1 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 t7 t2 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 t7 t5 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0
0 1 Vote histogram (so far) UF NNC Ptree Ex. 1 using 0-D Ptrees(sequences) a=a5 a6 a1’a2’a3’a4’=(000000) Identifying all training tuples in the distance=0 ring or 0ring, centered at a (exact matches ) as1-bitsof the Ptree, P=a5^a6^a1’^a2’^a3’^a4’ (we use _ for complement) There are no training points in a’s0ring! We must look further out, i.e., a’s1ring a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
(a5 a6a1’a2’a3’a4’) 0 1 OR UF NNC Ptree ex-1 (cont.) a’s 1ring? a=a5 a6 a1’a2’a3’a4’ = (000000) (001000) (100000) (010000) (000001) (000100) (000010) Training pts in the 1ring centered at a are given by 1-bits in the Ptree, P, constructed as follows: a5^a6^a1’^a2’^a3’^a4’ a5^a6^a1’^a2’^a3’^a4’ a5^a6^a1’^a2’^a3’^a4’ a5^a6^a1’^a2’^a3’^a4’ a5^a6^a1’^a2’^a3’^a4’ a5^a6^a1’^a2’^a3’^a4’ a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 The C=1 vote count = root count of P^C. a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 The C=0 vote count = root count of P^C. (never need to know which tuples voted) a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 C 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 P 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
a’s 2-ring? a=a5 a6 a1’a2’a3’a4’ = (000000) 0 1 (101000) (100010) (100100) (110000) (100001) For each of the following Ptrees, a 1-bit corresponds to a training point in a’s 2-ring: Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2’a3‘a4‘Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3‘a4’ 1st line first: a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 Stop here? But the other 10 Ptrees should also be considered. The fact that the 2-ring includes so many new training points is “The curse of demensionality”. a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
Enfranchising the rest of a’s 2-ring? a=a5 a6 a1’a2’a3’a4’ = (000000) 0 1 For each of the following Ptrees, a 1-bit corresponds to a training point in a’s 2-ring: Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2’a3‘a4‘Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3‘a4’ 2nd line: a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
Enfranchising the rest of a’s 2-ring (cont.) a=a5 a6 a1’a2’a3’a4’ = (000000) 0 1 For each of the following Ptrees, a 1-bit corresponds to a training point in a’s 2-ring: Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2’a3‘a4‘Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3‘a4’ 3rd line: a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
Enfranchising the rest of a’s 2-ring (cont.) a=a5 a6 a1’a2’a3’a4’ = (000000) For each of the following Ptrees, a 1-bit corresponds to a training point in a’s 2-ring: Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2’a3‘a4‘Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3‘a4’ 4th line: a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 P2 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1
Enfranchising the rest of a’s 2-ring (cont.) a=a5 a6 a1’a2’a3’a4’ = (000000) For each of the following Ptrees, a 1-bit corresponds to a training point in a’s 2-ring: Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2’a3‘a4‘Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ Pa5a6 a1‘a2’a3‘a4‘Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ Pa5a6 a1‘a2‘a3‘a4’ 5th line: a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 P3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
R( A1 A2 A3 A4) 010 111 110 001 011 111 110 000 010 110 101 001 010 111 101 111 101 010 001 100 010 010 001 101 111 000 001 100 111 000 001 100 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 0 0 1 1 0 0 or 1 0 1 Justification for using vertical structures (once again)? • For record-based workloads (where the result is a set of records), changing the horizontal record structure and then having to reconstruct it, may introduce too much post processing? • For data mining workloads, the result is often a bit (Yes/No, True/False) or another unstructured result (histogram), where there is no reconstructive post processing and the actual data records need never be involved?
Paper Topics in the area of NNC on UF? • If you decide to do a research project in this area, you might pick a particular DW area (VirtualCell data, Bioinformatics data, Market Basket data, Text data, Sales data, Scientific data, Astronomical data, ….). • Then discover an interpretation of the results of NNC that gives new, useful info. • e.g., in the last example NCC problem, if the data is gene expression data and C=1 means the gene is associated with a particular cancer, the previous results might be interpreted as “if none of the treatments, a5 a6 a1’ a2’ a3’ a4’ express at a threshold level, then the dissolved tissue is predicted to be cancerous (2/3 probability in the scan based NCC algorithm and 6/11 probability in the Ptree based NCC algorithm). • Other research projects in this setting could involve: • Looking at one of the other data mining techniques (clustering, ARM…) and applying it to a new data area. • Developing efficient algorithms (implement them and prove that they are efficient) of the various steps in this data mining methodology (or any other). • E.g., An efficient algorithm for “producing the basic Ptrees for a UF from the basic Ptrees for F and the Dis without having to actually construct the massive UF in the process” is suggested in these notes but the details (or a better method?) and performance work would make a good topic.
Paper Topics in the area of NNC on UF (continued) Stopping conditions in NCC: Note that we have assumed the user picks a k ahead of time (our example, k=3) then finds the k-nearest training neighbors to vote on the class assignment (or the 1st ring in which at least k voters appear – the closed kNNC method) In kNNC the prior choice of k determines when to stop accumulating voters. Other methods (address the curse of dimensionality)? Weight the votes by similarity distance from the unclassified sample? By weighting attributes beforehand?, weighting votes depending upon distance out?, or both?,(or something else?) All training points within a predefined similarity level (rather than count level)? Build out in rings until the histogram shows a clear enough winner? Note that the Histogram doesn’t get good and stay good, necessarily, so Build out past 1st good histogram to see if the 2nd “good” histogram is even better?…
Another example of Ptree NNCusing weights (about the only way to address the curse of dimensionality) a=a5 a6 a1’a2’a3’a4’=(010010), attribute weights (1, 1, 3, 3, 3, 3) vote weight = 1/(1+distance) d(p,q) = {weighti : p & q differ at i} Identifying all training tuples in the 0-ring centered at a (exact matches ) as1-bitsof the Ptree, P=a5^a6^a1’^a2’^a3’^a4’ a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
a’s 1ring? a=a5 a6 a1’a2’a3’a4’ = (010010) (110010) (000010) attribute weights (1, 1, 3, 3, 3, 3) vote weight = 1/(1+distance) d(p,q) = {weighti : p & q differ at i} a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
a’s 2ring? a=a5 a6 a1’a2’a3’a4’ = (010010) (100010) attribute weights (1, 1, 3, 3, 3, 3) d(p,q) = {weighti : p & q differ at i} vote weight = 1/(1+distance) a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
a’s 3-ring? a=a5 a6 a1’a2’a3’a4’ = (010010) Identify all training pts in the 3-ring centered at a Check each of a1’a2’a3’a4’ as the single difference. a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 0 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
a’s 4-ring?a=a5 a6 a1’a2’a3’a4’ =(010010) 0 1/5 Vote Tally: C=0 C=1 Attribute weights (1, 1, 3, 3, 3, 3) d(p,q) = {weighti : p & q differ at i} vote weight = 1/(1+distance) Identify all training pts in the 4-ring centered at a Check Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6 a1‘a2‘a3’a4‘Pa5a6 a1‘a2‘a3‘a4’ as differing, in turn. a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 2/5 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
a’s 5-ring?a=a5 a6 a1’a2’a3’a4’ =(010010) 0 2/5 Vote Tally: C=0 C=1 attribute weights (1, 1, 3, 3, 3, 3) d(p,q) = {weighti : p & q differ at i} vote weight = 1/(1+distance) Identify all training pts in the 5-ring centered at a Check Pa5a6a1’a2‘a3‘a4‘ Pa5a6 a1‘a2’a3‘a4‘ Pa5a6a1‘a2‘a3’a4‘Pa5a6a1‘a2‘a3‘a4’ as differing, in turn. Stop here? (C=1 winner) Interactive Stop on Command System? Note: An ISoC system seems easy with vertical Ptree methods, but hard with horizontal scan methods? Projects?: Implement such an ISoC NNC? Allow users to decide attribute weights interactively also ??? Ptree NNC which stops only after all classes have at least 1 vote (or after certain thresholds are achieved?) How does this perform wrt standard stopping methods? I think users would really like a system in which they could interactively control the vote and also do a “recall” vote if they don’t like the outcome? (a California NNC? or CNNC) a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 5/30 17/30+5/30 =22/30 2/5+1/6 =17/30 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5
Iceberg Queries • On any relation (not just the UF of a DW), R(ai,…,an,b) , find all tuples for which an aggregate (e.g., sum) over a set of attribute(s) exceeds a threshold (why iceberg? Because the result set is small and therefore the tip of the iceberg) • SELECT * FROM R GROUPED BY ai1 ,…, aik WHERE aggr(b) theshold; • E.g., SALES( CUST, ITEM, TIME, CTRY, $SOLD ) • e.g., typical “who?, what?, when?, where?” data cube (wwww data cube) with measurement, “how much?” SELECT * FROM SALES GROUPED BY CUST,ITEM WHERE SUM($SOLD)$10M (i.e., “Which are our big customer-item match-ups over all time and locations?”) Ptrees: a=(a1,…,an) output if i=1..8RootCount(Pa^Pbi) * 28-i threshold (b=b1…b8 in bits) Still must sequence through all a values? Assuming very few meet the threshold (the iceberg assumption, devise a pruning mechanism for the search by considering each each bit in turn (from high order on down) First all combos: i=1..8RootCount(Pa11^…^Pan1^Pbi) * 28-I underscore indicates a choice of ‘ or not. Then i=1..8RootCount(Pa11^ Pa12^…^Pan1^ Pan2^Pbi) * 28-I etc. Whenever an attribute makes the threshold for only one choice (‘ or not) eliminate the other. Whenever an attribute makes the threshold for no choices, done (no iceberg) i=1..8RootCount(Pa1..aj1..ajk..an^Pbi) * 28-I (k=1..8 enumerates bits of aj j=1..rj1enumerates the attributes). If this falls below the threshold for some k, prune aj (if the sum falls below the level at which the remain bits of aj can’t possibly make it to the threshold, prune). From the surviving aj ‘s Assume numbers from 0 - 255
Appendix: scratch slides a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 a4 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a7 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a8 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a9 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 0 C 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a5‘ 1 1 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 a6‘ 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 a7‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a8‘ 0 0 1 0 1 1 1 0 0 1 1 0 1 1 1 0 1 a9‘ 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 d1 d2 t1 t2 t1 t3 t1 t5 t1 t6 t2 t1 t2 t7 t3 t1 t3 t2 t3 t3 t3 t5 t5 t1 t5 t3 t5 t5 t5 t7 t6 t1 t7 t2 t7 t5 C' 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 a2 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 a3 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1
Appendix: scratch slides a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a4‘ 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a3‘ 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 1 0 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a2‘ 1 1 1 0 0 0 0 1 1 1 0 1 1 0 0 1 1 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a1‘ 0 0 0 1 1 0 1 0 0 0 1 0 0 0 1 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a6 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 a5 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0
Appendix: scratch slides a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a4‘ 1 1 0 1 1 0 1 1 1 0 1 1 0 0 1 1 0 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a3‘ 0 1 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a2‘ 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a1‘ 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a6 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1 a5 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 1 1