280 likes | 294 Views
This article explores the theory and application of dot product gap clustering, a technique for clustering data based on their dot product gaps. It discusses how to use this theory for clustering and classification tasks, and presents a specific algorithm called the FAUST Classifier MVDI (Maximized Variance Definite Indefinite) for building decision trees based on this approach. The article also discusses the use of gaps in range to separate data in unsupervised and supervised scenarios, and explores different approaches for different types of tables. The computations presented in the article are efficient and instantaneous, making them practical for real-world applications.
E N D
How do we use this theory? For Dot Product gap based Clustering, we can hill-climb akk below to a d that gives us the global maximum variance. Heuristically, higher variance means more prominent gaps. Xod=Fd(X)=DPPd(X) d1 x1od x1 x2 : xN x2od = - ( j=1..nXj dj)2 = i=1..N(j=1..nxi,jdj)2 xNod dn V(d)≡VarianceXod=(Xod)2 - (Xod)2 M1 M2 : MC For Dot Product Gap based Classification, we can start with X = the table of the C Training Set Class Means, where Mk≡MeanVectorOfClassk. = i(jxi,jdj) - (jXj dj) (kXk dk) (kxi,kdk) + j<kxi,jxi,kdjdk = ijxi,j2dj2 1 1 1 2 Then Xi = Mean(X)i and N N N N and XiXj = Mean Mi1 Mj1 . : +2j<kXjXkdjdk - " = jXj2 dj2 +2j<kXjXkdjdk - jXj2dj2 2a11d1 V(d)= +j1a1jdj MiC MjC XjXk)djdk ) +(2j=1..n<k=1..n(XjXk- 2a22d2 = j=1..n(Xj2 - Xj2)dj2 + +j2a2jdj : 2anndn +jnanjdj V(d) = V(d)=jajjdj2 ijaijdidj + jkajkdjdk subject to i=1..ndi2=1 dTo A o d = V(d) d1 : dn V i XiXj-XiX,j : d1 ... dn V(d)≡Gradient(V)=2Aod 2a11 2a12 ... 2a1n 2a21 2a22 ... 2a2n : ' 2an1 ... 2ann d1 : di : dn or Ubhaya Theorem1: k{1,...,n} s.t. d=ek will hill-climb V to its globally max. Theorem2 (working on it): Let d=ek s.t. akk is a maximal diagonal element of A, d=ek will hill-climb V to its globally maximum. Maximizing theVariance Given any table, X(X1, ..., Xn), and any unit vector, d, in n-space, let These computations are O(C) (C=number of classes) and are instantaneous. Once we have the matrix A, we can hill-climb to obtain a d that maximizes the variance of the dot product projections of the class means. FAUST Classifier MVDI (Maximized Variance Definite Indefinite: Build a Decision tree. 1. Find d that maximizes variance of dot product projections of class means each round. 2. Apply DI each round FAUST technology relies on: 1. a distance dominating functional, F. 2. Use of gaps in range(F) to separate. We can separate out the diagonal or not: For Unsupervised (Clustering) Hierarchical Divisive? Piecewise Linear? other? Perf Anal (which approach is best for which type of table?) For Supervised (Classification), Decision Tree? Nearest Nbr? Piecewise Linear? Perf Anal (which is best for training set?) d1≡(V(d0)); d0, one can hill-climb it to locally maximize the variance, V, as follows: d2≡(V(d1)):... where White papers: Terabyte Head Wall. The Only Good Data is Data in Motion Multilevel pTrees: k=0,1 suffices! A PTreeSet is defined by specifying a table, an array of stride_lengths (usually equi-length so just that one length is specified) and a stride_predicate (T\F condition on a stride (stride=bag [or array?] of bits): So the metadata of PTreeSet(T,sl,sp) specifies T, sl and sp. A “raw” PTreeSet has sl=1 and the identity predicate (sl and sp not used). A “cooked” PTreeSet (AKA Level-1 PTreeSet) for a table with sl1 (main purpose: provide compact summary information on the table.) Let PTS(T) be a raw PTreeSet, then it, plus PTS(T,64,p), ..., PTS(T,64^k,p) form a tree of vertical summarizations of T. Note that P(T, 64*64, p) is different from P(P(T,64,p), 64, p), but both make sense since P(t, 64, p) is a table and P(P(T, 64, p), 64, p) is just a cooked pTree on it.
FAUST MVDI (-1, 16.5=avg{23,10})s sCt=50 (16.5, 38)e eCt=24 (48.128)i iCt=39 d=(.33, -.1, .86, .38) (-1,8)e Ct=21 (10,128)i Ct=9 indef[38, 48]se_i seCt=26 iCt=13 indef[8,10]e_i eCt=5 iCt=4 Definite Indefinite i-Mean 62.8 29.2 46.1 14.5 i -1 8 e-Mean 59 26.9 49.6 18.4 e 10 17 i_e 8 10 empty d=(-.55, -.33, .51, .57) d0=(.33, -.1, .86,.38) 16.5 xod0 < 38 xod0 < 16.5 38 xod0 48 48 < xod0 Setosa Virginica Versicolor d1=(-.55, -.33, .51, .57) xod1 < 9 xod1 9 Virginica Versicolor on IRIS 15 records from each Class for Testing (Virg39 was removed as an outlier.) Definite_____ Indefinite s-Mean 50.49 34.74 14.74 2.43 s -1 10 e-Mean 63.50 30.00 44.00 13.50 e 23 48 s_ei 23 10 empty i-Mean 61.00 31.50 55.50 21.50 i 38 70 se_i 38 48 In this case, since the indefinite interval is so narrow, we absorb it into the two definite intervals; resulting in decision tree:
FAUST MVDI SatLog 413train 4atr 6cls 127test Using class means: FoMN Ct min max max+1 mn4 83 101 104 82 113 8 110 121 122 mn3 85 103 108 85 117 79 105 128 129 mn1 69 106 115 94 133 12 123 148 149 Using full data: (much better!) mn4 83 101 104 82 59 8 56 65 66 mn3 85 103 108 85 62 79 52 74 75 mn1 69 106 115 94 81 12 73 95 96 d=(0.39 0.89 0.35 0.10 ) F[a,b) 0 92 104 118 127 146 156 157 161 179 190 Class 2 2 2 2 2 2 5 5 5 5 7 7 7 7 7 7 1 1 1 1 1 1 1 4 4 4 4 4 3 3 3 3 d=(-.11 -.22 .54 .81) F[a,b) 89 102 Class 5 2 d=(-.15 -.29 .56 .76) F[a,b) 47 65 81 101 Class 7 5 5 2 2 d=(-.81, .17, .45, .33) F[a,b) 21 3541 59 Class 3 1 d=(-.01, -.19, .7, .69) d=(-.66, .19, .47, .56) F[a,b) 57 6169 87 Class 5 7 F[a,b) 5256667375 Class 333 3 4 11 cl=4 cl=7 Cl=7 Gradient Hill Climb of Variance(d) d1 d2 d3 d4 Vd) 0.00 0.00 1.00 0.00 282 0.13 0.38 0.64 0.65 700 0.20 0.51 0.62 0.57 742 0.26 0.62 0.57 0.47 781 0.30 0.70 0.53 0.38 810 0.34 0.76 0.48 0.30 830 0.36 0.79 0.44 0.23 841 0.37 0.81 0.40 0.18 847 0.38 0.83 0.38 0.15 850 0.39 0.84 0.36 0.12 852 0.39 0.84 0.35 0.10 853 Fomn Ct min max max+1 mn2 49 40 115 119 106 108 91 155 156 mn5 58 58 76 64 108 61 92 145 146 mn7 69 77 81 64 131 154 104 160 161 mn4 78 91 96 74 152 60 127 178 179 mn1 67 103 114 94 167 27 118 189 190 mn3 89 107 112 88 178 155 157 206 207 Gradient Hill Climb of Var(d)on t25 d1 d2 d3 d4 Vd) 0.00 0.00 0.00 1.00 1137 -0.11 -0.22 0.54 0.81 1747 MNod Ct ClMn ClMx ClMx+1 mn2 45 33 115 124 150 54 102 177 178 mn5 55 52 72 59 69 33 45 88 89 Gradient Hill Climb of Var(d)on t257 0.00 0.00 1.00 0.00 496 -0.15 -0.29 0.56 0.76 1595 Same using class means or training subset. Gradient Hill Climb of Var(d)on t75 0.00 0.00 1.00 0.00 12 0.04 -0.09 0.83 0.55 20 -0.01 -0.19 0.70 0.69 21 Gradient Hill Climb of Var(d)on t13 0.00 0.00 1.00 0.00 29 -0.83 0.17 0.42 0.34 166 0.00 0.00 1.00 0.00 25 -0.66 0.14 0.65 0.36 81 -0.81 0.17 0.45 0.33 88 On the 127 sample SatLog TestSet: 4 errors or 96.8% accuracy. speed? With horizontal data, DTI is applied one unclassified sample at a time (per execution thread). With this pTree Decision Tree, we take the entire TestSet (a PTreeSet), create the various dot product SPTS (one for each inode), create ut SPTS Masks. These masks mask the results for the entire TestSet. Gradient Hill Climb of Var(d)on t143 0.00 0.00 1.00 0.00 19 -0.66 0.19 0.47 0.56 95 0.00 0.00 1.00 0.00 27 -0.17 0.35 0.75 0.53 54 -0.32 0.36 0.65 0.58 57 -0.41 0.34 0.62 0.58 58 For WINE: min max+1 8.40 10.33 27.00 9.63 28.65 9.9 53.4 7.56 11.19 32.61 10.38 34.32 7.7 111.8 8.57 12.84 30.55 11.65 32.72 8.7 108.4 8.91 13.64 34.93 11.97 37.16 13.1 92.2 Awful results! Gradient Hill Climb of Var t156161 0.00 0.00 1.00 0.00 5 -0.23 -0.28 0.89 0.28 19 -0.02 -0.06 0.12 0.99 157 0.02 -0.02 0.02 1.00 159 0.00 0.00 1.00 0.00 1 -0.46 -0.53 0.57 0.43 2 Inconclusive both ways so predict purality=4(17) (3ct=3 tct=6 Gradient Hill Climb of Var t146156 0.00 0.00 1.00 0.00 0 0.03 -0.08 0.81 -0.58 1 0.00 0.00 1.00 0.00 13 0.02 0.20 0.92 0.34 16 0.02 0.25 0.86 0.45 17 Inconclusive both ways so predict purality=4(17) (7ct=15 2ct=2 Gradient Hill Climb of Var t127 0.00 0.00 1.00 0.00 41 -0.01 -0.01 0.70 0.71 90 -0.04 -0.04 0.65 0.75 91 0.00 0.00 1.00 0.00 35 -0.32 -0.14 0.59 0.73 105 Inconclusive predict purality=7(62 4(15) 1(5) 2(8) 5(7)
FAUST MVDI Concrete d0= -0.34 -0.16 0.81 -0.45 xod3<969 xod0<320 xod2<28 xod>=19.3 xod2>=662 xod2>=92 xod0>=634 xod>=18.6 d1= .85 -.03 .52 -.02 d2= .85 -.00 .53 .05 Class=m (test:1/1) Class= l or m Cl=l *test 6/9) Class=m errs0/1) Class=m errs8/12) Cl=h (test:11/12) Class=m errs0/4) Class=m errs0/0) Class=l (test:1/1) Class=m (test:2/2) xod<13.2 xod<13.2 .00 .00 1.00 .00 1.0 8.0 6 4 l 4.0 5.0 0 0 m 2.0 9.0 0 0 h 0 2 2 99 .97 .19 .08 .16 d1 13.4 19.6 0 0 l 16.9 19.9 4 3 m 13.5 16.0 0 0 h 0 13.45 18.6 99 0.97 0.19 0.06 0.15 14.4 19.6 0 0 l 16.8 18.8 0 0 m 13.5 15.8 11 1 h 0 14.366 17.816 99 Class=l errs:0/4) Class=h errs:0/5) Class=h errs:0/5) Class=h errs:0/1) d3= .81 .04 .58 .01 xod4>=681 xod3>=868 Cl=m (test:1/1) Cl=l (test:0/3) d4 = .79 .14 .60 .03 xod4<640 Cl=l *test 2/2) xod3<544 Cl=m *test 0/0) 7 test errors / 30 = 77% For Concrete min max+1 train 335.3 657.1 0 l 120.5 611.6 12 m 321.1 633.5 0 h Test 0 l ****** 1 m ****** 0 h ****** 0 321 3.0 57.0 0 l 3.0 361.0 11 m 28.0 92.0 0 h 0 l ***** 2 m ***** 0 h 92 ***** 999 .97 .17 -.02 .15 d0 13.3 19.3 0 0 l 16.4 23.5 0 0 m 12.2 15.2 25 5 h 0 13.2 19.3 23.5 Seeds d3 547.9 860.9 4 l 617.1 957.3 0 m 762.5 867.7 0 h 0 l ******* 0 m ******* 0 h . 0 ******* 617 8 test errors / 32 = 75% d2 544.2 651.5 0 l 515.7 661.1 0 m 591.0 847.4 40 h 1 l ****** 0 m ****** 11 h 662 ****** 999
0. Cut in middle of the means: a= (mR+(mV-mR)/2)od = (mR+mV)/2od D≡mRmVd=D/|D| PR=Pxod<a PV=Pxoda 5. PR=Pxod<CutR PV=Pxod>CutV Min{Vod}Max{Rod} CutR=CutV=avg{minVod,minRod}, else CutR≡Min{Vod}, Cut≡Max{Rod} vomR vomV MnVod V MaxRod R d2-line d-line d d2 a FAUST Classifier 1. Cut in the middle of:VectorOfMedians (VOM), not the means. Use stdev ratio not middle for even better cut placement? 2. Cut in the middle of{Max{Rod},Min{Vod}. (assuming mRodmVod) If no gap, move cut to minimize Rerrors + Verrors. 3. Hill-climb d to maximize gap or to minimize training set errors or (simplest) to minimize dis(max{rod},min{vod}) . 4. Replace mr, mv with the avg of the margin points? y PR or yPV , Definite classifications; else re-do on Indefinite region,PCutRxodCutV until actual gap (AND with certain stop cond? E.g., "On nth round, use definite only (cut at midpt(mR,mV)." Another way to view FAUST DI is that it is a Decision Tree Method. With each non-empty indefinite set, descend down the tree to a new level For each definite set, terminate the descent and make the classification. dim 2 Each round, it may be advisable to go through an outlier removal process on each class before setting Min{Vod} and Max{Rod} (E.g., Iteratively check if F-1(Min{Vod}) consists of V-outliers). rvv rmRrv v v v r rv mV v rv v r v dim 1
FAUST DI K-class training set, TK, and a given d (e.g., from D≡MeanTKMedTK): Let mi≡meanCi s.t. dom1dom2 ...domKMni≡Min{doCi} Mxi≡Max{doCi} Mn>i≡Minj>i{Mnj} Mx<i≡Maxj<i{Mxj} Definitei = ( Mx<i, Mn>i ) Indefinitei,i+1 = [ Mn>i, Mx<i+1 ] Then recurse on each Indefinite. For IRIS 15 records were extracted from each Class for Testing. The rest are the Training Set, TK. D=MEANsMEANe Definite_____ Indefinite__ s-Mean 50.49 34.74 14.74 2.43 s -1 25 e-Mean 63.50 30.00 44.00 13.50 e 10 37 se 25 10 empty i-Mean 61.00 31.50 55.50 21.50 i 48 128 ei 37 48 F < 18 setosa (35 seto) 1ST ROUND D=MeansMeane 18 < F < 37 versicolor (15 vers) 37 F 48 IndefiniteSet2 (20 vers, 10 virg) 48 < F virginica (25 virg) F < 7 versicolor (17 vers. 0 virg) IndefSet2 ROUND D=MeaneMeani 7 F 10 IndefSet3 ( 3 vers, 5 virg) 10 < F virginica ( 0 vers, 5 virg) F < 3 versicolor ( 2 vers. 0 virg) IndefSet3 ROUND D=MeaneMeani 3 F 7 IndefSet4 ( 2 vers, 1 virg) Here we will assign 0 F 7 versicolor 7 < F virginica ( 0 vers, 3 virg) 7 < F virginica Test: F < 15 setosa (15 seto) 1ST ROUND D=MeansMeane 15 < F < 15 versicolor ( 0 vers, 0 virg) 15 F 41 IndefiniteSet2 (15 vers, 1 virg) 41 < F virginica ( 14 virg) F < 20 versicolor (15 vers. 0 virg) IndefSet2 ROUND D=MeaneMeani 20 < F virginica ( 0 vers, 1 virg) 100% accuracy. Option-1: The sequence of D's is: Mean(Classk)Mean(Classk+1) k=1... (and Mean could be replaced by VOM or?) Option-2: The sequence of D's is: Mean(Classk)Mean(h=k+1..nClassh) k=1... (and Mean could be replaced by VOM or?) Option-3: D seq: Mean(Classk)Mean(h not used yetClassh) where k is the Class with max count in subcluster (VoM instead?) Option-2: D seq.: Mean(Classk)Mean(h=k+1..nClassh) (VOM?) where k is Class with max count in subcluster. Option-4: D seq.: always pick the means pair which are furthest separated from each other. Option-5: D Start with Median-to-Mean of IndefiniteSet, then means pair corresp to max separation of F(meani), F(meanj) Option-6: D Always use Median-to-Mean of IndefiniteSet, IS. (initially, IS=X)
FAUST DI sequential For SEEDS 15 records were extracted from each Class for Testing. Cl=1 2 3 0 0 0 0 0 1 Cls3 outlier (F=0) Cl=1 2 3 0 0 0 1 0 0 Cls1 outlier (F=29) Cl=1 2 3 0 0 0 0 0 0 done! declare Class=1 Cl=1 2 3 0 0 0 1 0 0 Cls1 outlier(F=54) Cl=1 2 3 5 0 2 Cl=1 2 3 5 0 3 Cl=1 2 3 6 0 3 Cl=1 2 3 5 0 2 m1 13.2 5.2 4.0 5.0 9 avF1 DEFINITE INDEFINITE def3[ -inf 0 ) m3 13.0 5.0 4.0 5.0 6 avF3 def1[ 13 inf ) in11[ 0 13 ) m1 13.0 5.2 3.6 5.0 13 avF1 DEFINITE INDEFINITE def3[ -inf 9 ) m3 13.0 5.0 4.0 5.0 9 avF3 def1[ 19 inf ) in1111[ 9 19 ) m1 13.0 5.1 3.7 5.0 30 avF1 DEFINITE INDEFINITE def3[ -inf 0 ) m3 13.0 5.0 4.0 5.0 27 avF3 def1[ 37 inf ) in11[ 0 37 ) m1 13.0 5.2 3.6 5.0 13 avF1 DEFINITE INDEFINITE def3[ -inf 9 ) m3 13.0 5.0 4.0 5.0 9 avF3 def1[ 19 inf ) in111[ 9 19 ) On Indef-11 On Indef-111 On Indef-1111 On Indef-1 Option-4, means pair most separated in X. m1 14.4 5.6 2.7 5.1 4.4 d(m1,m2) DEFINITE INDEFINITE m2 18.6 6.2 3.7 6.0 3.4 d(m1,m3) 2 -inf 0 m3 11.8 5.0 4.7 5.0 7.0 d(m2,m3) 1 106 0 12 0 106 0 F 106, 3 106 inf 23 0 106 so totally non-productive! Option-6: D Median-to-Mean of IndefSet (initially IS=X) m1 14.4 5.6 2.7 5.1 37.3 meanF1 DEFINITE Cl=1 2 3 INDEFINITE m2 18.6 6.2 3.7 6.0 71.2 meanF2 def3[ -inf 21) 0 0 32 m3 11.8 5.0 4.7 5.0 `2.0 meanF3 def1[ 28 49) 22 0 0 ind1[ 21 28 ) On whole TR def2[ 58 inf) 0 30 0 ind2[ 49 58 )
FAUST DI sequential For SEEDS 15 records were extracted from each Class for Testing. D Mean(loF)-to-Mean(hiF) of IndefSet12 D Mean(loF)-to-Mean(hiF) of IndefSet313131 (d repeats after this so=C1 D Mean(loF)-to-Mean(hiF) of IndefSet31 D Mean(loF)-to-Mean(hiF) of IndefSet1313 Cl=1 2 3 5 0 0 0 5 0 Cl=1 2 3 0 0 1 1 0 0 Cl=1 2 3 1 0 0 0 0 0 Cl=1 2 3 0 0 0 1 0 0 The rest, Class=1 Cl=1 2 3 . 5 0 2 Cl=1 2 3 . 4 0 2 Cl=1 2 3 . 0 0 0 Cl=1 2 3 . 6 0 3 m1 16.2 6.0 1.8 5.2 5.8 avF1 DEFINITE INDEFINITE m2 16.6 6.0 4.6 6.0 6.2 avF2 def1[ -inf 2 ) def2[ 15 inf ) in1212[ 2 15 ) m1 12.8 5.2 3.2 5.0 18 avF1 DEFINITE INDEFINITE m3 13.0 5.0 4.0 5.0 10 avF3 def3[ -inf 10 ) . def1[ 20 inf ) in313131[ 10 20 ) . m1 13.0 5.1 3.7 5.0 30 avF1 DEFINITE INDEFINITE m3 13.0 5.0 4.0 5.0 27 avF3 def1[-inf 18 ) . def3[ 55 inf ) in1313[ 18 55 ) m1 13.0 5.2 3.6 5.0 4 avF1 DEFINITE INDEFINITE m3 13.0 5.0 3.5 5.0 2 avF3 def1[ -inf 0 ) def3[ 5 inf ) C1= [ 0 5 ) Option-6: D Median-to-Mean of X m1 14.4 5.6 2.7 5.1 37.3 meanF1 DEFINITE Cl=1 2 3 INDEFINITE m2 18.6 6.2 3.7 6.0 71.2 meanF2 def3[ -inf 21) 0 0 32 m3 11.8 5.0 4.7 5.0 `2.0 meanF3 def1[ 28 49) 22 0 0 ind31[ 21 28 ) On whole TR def2[ 58 inf) 0 30 0 ind12[ 49 58 ) [-inf, 21)class=3 [28, 49)class=2 [58.inf) class=3 d=(.,9, -,1, -.2, -.2) [21,28)ind31 d=(-.9, -.1, .14, -.1)[49, 58)ind12 d=(0, .31, -.9, 0) [-inf,18)def[49, 58)ind23
Xod=Fd(X)=DPPd(X) d1 x1od x1 x2 : xN x2od = - ( j=1..nXj dj)2 = i=1..N(j=1..nxi,jdj)2 xNod dn V(d)≡VarDPPd(X)= (Xod)2 - (Xod)2 = i(jxi,jdj) - (jXj dj) (kXk dk) (kxi,kdk) + j<kxi,jxi,kdjdk sub to i di2=1 = ijxi,j2dj2 Maximize wrt d, |Mean(DPPd(X)) - Median(DPPd(X)| Mean(DPPdX)=(1/N)i=1..Nj=1..nxi,jdj = j=1..n Xjdj =j (1/Nixi,j ) dj 1 2 1 1 N N N N +2j<kXjXkdjdk - " = jXj2 dj2 +2j<kXjXkdjdk - jXj2dj2 2a11d1 V(d)= +j1a1jdj do=ek s.t. akk is max or d0k=akk d1≡(V(d0)) d2≡(V(d1)) til F(dk) XjXk)djdk ) +(2j=1..n<k=1..n(XjXk- 2a22d2 = j=1..n(Xj2 - Xj2)dj2 + +j2a2jdj : 2anndn +jnanjdj V(d) = V(d)=jajjdj2 ijaijdidj + jkajkdjdk subject to i=1..ndi2=1 dTo VX o d = VarDPPdX≡V d1 : dn V i XiXj-XiX,j : d1 ... dn MEDIAN picks out last 2 sequences which have best gaps (discounting outlier gaps at the extremes) and it discards 1,3,4 which are not so good. Finding good unit vector, d, for Dot Prod functional, DPP. to maximize gaps GRADIENT(V) = 2A o d 2a11 2a12 ... 2a1n 2a21 2a22 ... 2a2n : ' 2an1 ... 2ann d1 : di : dn Compute Median(DPPd(X)? Want to use only pTree processing. Want a formula in d and numbers only (like the one above for the mean (involves only the vector d and the numbers X1 ,..., Xn ) FAUST CLUSTERING Use DPPd(x), but which unit vector, d*, provides the best gap(s)? 1. DPPd exhaustively searches a grid of d's for the best gap provider. 2. Use some heuristic to choose a good d? GV: Gradient-optimized Variance MM: Use the d that maximizes |MedianF(X)-Mean(F(X))|. We have Avg as a function of d. Median? (Can you do it?) HMM: Use a heuristic for MedianF(X): F(VectorOfMedians)=VOMod MVM: Use D=MEAN(X)VOM(X), d=D/|D| Maximize variance - is it wise? 0 0 0 0 0 0 0 0 1 0 5 0 0 0 0 0 2 0 5 2 0 0 0 0 3 0 5 2 3 0 0 0 4 0 5 4 3 6 0 0 median 5 0 5 4 3 6 9 0 6 0 5 6 6 6 9 10 7 0 5 6 6 6 9 10 8 0 5 8 6 9 9 10 9 0 5 8 9 9 9 10 10 10 10 10 10 10 10 10 std 3.16 2.87 2.13 3.20 3.35 3.82 4.57 4.98 variance 10.0 8.3 4.5 10.2 11.2 14.6 20.9 24.8 Avg 5.00 0.91 5.00 4.55 4.18 4.73 5.00 4.55 consecutive 1 0 5 0 0 0 0 0 differences 1 0 0 2 0 0 0 0 1 0 0 0 3 0 0 0 1 0 0 2 0 6 0 0 1 0 0 0 0 0 9 0 1 0 0 2 3 0 0 10 1 0 0 0 0 0 0 0 1 0 0 2 0 3 0 0 1 0 0 0 3 0 0 0 1 10 5 2 1 1 1 0 avgCD 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 maxCD 1.00 10.00 5.00 2.00 3.00 6.00 9.00 10.00 ||mean-VOM| 0.00 0.91 0.00 0.55 1.18 1.27 4.00 4.55
FAUST Clustering, simple example: Gd(x)=xod Fd(x)=Gd(x)-MinG on a dataset of 15 image points 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0Level0, stride=z1 PointSet (as a pTree mask) z1 z2 z3 z4 z5 z6 z7 z8 z9 za zb zc zd ze zf Fp=MN,q=z1=0 F=1 F=2 X x1 x21 2 3 4 5 6 7 8 9 a b 1 1 1 1=q 3 1 2 3 2 2 3 2 4 3 3 4 5 2 5 5 9 3 6 15 1 7 f 14 2 8 15 3 9 6 p d 13 4 a b 10 9 b c e 1110 c 9 11 d a 1111 e 8 7 8 f 7 9 The 15 Value_Arrays (one for each q=z1,z2,z3,...) z1 0 1 2 5 6 10 11 12 14 z2 0 1 2 5 6 10 11 12 14 z3 0 1 2 5 6 10 11 12 14 z4 0 1 3 6 10 11 12 14 z5 0 1 2 3 5 6 10 11 12 14 z6 0 1 2 3 7 8 9 10 z7 0 1 2 3 4 6 9 11 12 z8 0 1 2 3 4 6 9 11 12 z9 0 1 2 3 4 6 7 10 12 13 za 0 1 2 3 4 5 7 11 12 13 zb 0 1 2 3 4 6 8 10 11 12 zc 0 1 2 3 5 6 7 8 9 11 12 13 zd 0 1 2 3 7 8 9 10 ze 0 1 2 3 5 7 9 11 12 13 zf 0 1 3 5 6 7 8 9 10 11 The 15 Count_Arrays z1 2 2 4 1 1 1 1 2 1 z2 2 2 4 1 1 1 1 2 1 z3 1 5 2 1 1 1 1 2 1 z4 2 4 2 2 1 1 2 1 z5 2 2 3 1 1 1 1 1 2 1 z6 2 1 1 1 1 3 3 3 z7 1 4 1 3 1 1 1 2 1 z8 1 2 3 1 3 1 1 2 1 z9 2 1 1 2 1 3 1 1 2 1 za 2 1 1 1 1 1 4 1 1 2 zb 1 2 1 1 3 2 1 1 1 2 zc 1 1 1 2 2 1 1 1 1 1 1 2 zd 3 3 3 1 1 1 1 2 ze 1 1 2 1 3 2 1 1 2 1 zf 1 2 1 1 2 1 2 2 2 1 gap: [F=6, F=10] gap: [F=2, F=5] pTree masks of the 3 z1_clusters (obtained by ORing) z11 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 z12 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 z13 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
What have we learned? What is the DPPd FAUST CLUSTER algorithm? X2=SubCluster2 SubCluster1 D=MedianMean, d1≡D/|D| is a good start. But first, Variance-Gradient hill-climb it. (Median means Vector of Medians). For X2=SubCluster2 use a d2 which is perpendicular to d1? In high dimensions, there are many perpendicular directions. GV hill-climb d2=D2/|D2| (D2=MedianX2-MeanX2) constrained to be to d1, i.e., constrained to d2od1=0 (in addition to d2od2=1. We may not want to constrain this second hill-climb to unit vectors perpendicular to d1. It might be the case that the gap gets wider using a d2 which is not perpendicular to d1? GMP:Gradient hill-climb (wrt d) VarianceDPPd starting at d2=D2/|D2| where d2≡Unitized( Vom{x-xod1|xX2} - Mean{x-xod1|xX2} ) Variance-Gradient hill-climbed subject only to dod=1 (We shouldn't constrain the 2nd hill-climb to d1od2=0 and subsequent hill-climbs to dkodh=0, h=2...k-1. (gap could be larger). So the 2nd round starts at d2≡Unitized( Vom{x-xod1|xX2} - Mean{x-xod1|xX2} ) and hill-climbs subject only to dod=1) GCCP:Gradient hill-climb (wrt d) VarianceDPPd starting at d2=D2/|D2| where D2=CCi(X2)-CCj(X2), and hill-climbs subject to dod=1, where the CCs are two of the Circumscribing rectangle's Corners (the CCs may be a faster calculations than Mean and Vom). Taking all edges and diagonals of CCR(X) (the Coordinate-wise Circumscribing Rectangle of X) provides a grid of unit vectors. It is an equi-spaced grid iff we use a CCC(X) (Coordinate-wise Circumscribing Cube of X). Note that there may be many CCC(X)s. A canonical one is the one that is furthest from the origin (take the longest side first. Extend each other side the same distance from the origin side of that edge. A good choice may be to always take the longest side of CR(X) as D, D≡LSCR(X). Should outliers on the (n-1)-dim-faces at the ends of LSCR(X) be removed first? So remove all LSCR(X)-endface outliers until after removal the same side is still the LSCR(X). Then use that LSCR(X) as D.
MVM C11 F-MN gp2 0 1 1 1 1 1 2 3 1 3 3 2 5 2 1 6 1 2 8 2 2 10 2 1 11 1 1 12 4 1 13 1 2 15 2 WINE GV GM ACCURACY WINE GV 62.7 MVM 66.7 GM 81.3 .11 .19 .96 .19 209 -.02 .41 .91 0 232 C1(F-MN) gp3 0 1 1 1 6 1 2 5 1 3 2 1 4 4 1 5 8 1 6 8 1 7 4 1 8 3 1 9 7 1 10 1 1 11 4 1 12 6 1 13 4 1 14 2 1 15 3 1 16 3 1 17 2 1 18 2 1 19 3 1 20 4 1 21 6 1 22 4 1 23 1 1 24 2 1 25 4 1 26 1 1 27 1 2 29 2 1 30 2 2 32 1 3 35 1 1 36 1 1 37 1 1 38 1 1 39 4 1 40 2 2 42 2 2 44 1 1 45 2 2 47 4 1 48 2 1 49 1 1 50 1 3 53 1 1 54 2 1 55 2 [0.12) 1L 0H F-MN Ct gp8 0 1 12 12 1 3 15 2 13 28 1 2 30 1 2 32 2 2 34 1 1 35 2 3 38 1 8 46 1 1 47 3 10 57 1 1 58 1 1 59 1 1 60 1 2 62 1 2 64 1 1 65 1 1 66 1 1 67 4 1 68 2 1 69 1 1 70 1 2 72 3 1 73 1 1 74 3 1 75 2 1 76 1 1 77 1 2 79 1 3 82 1 1 83 1 1 84 2 1 85 1 1 86 1 2 88 2 1 89 4 1 90 2 1 91 1 1 92 6 1 93 3 1 94 5 1 95 4 2 97 5 1 98 2 1 99 1 1 100 4 1 101 7 1 102 4 1 103 2 1 104 3 1 105 6 1 106 3 1 107 8 1 108 10 1 109 2 1 110 4 1 111 5 1 112 2 1 113 4 1 114 1 .07 .15 .98 .12 588 -.01 .26 .97 .00 608 (F-MN) gp8 0 1 1 1 4 1 2 4 1 3 5 1 4 4 1 5 6 1 6 8 1 7 6 1 8 4 1 9 5 1 10 2 1 11 3 1 12 7 1 13 4 1 14 3 1 15 2 1 16 2 1 17 3 1 18 4 1 19 3 1 20 4 1 21 1 1 22 7 1 23 2 1 24 4 1 25 1 1 26 1 1 27 1 1 28 1 1 29 1 1 30 1 1 31 1 1 32 1 3 35 1 2 37 3 1 38 1 1 39 1 1 40 3 1 41 3 3 44 2 1 45 2 1 46 4 1 47 2 2 49 1 2 51 1 1 52 1 3 55 1 1 56 1 1 57 1 9 66 2 1 67 2 8 75 1 4 79 2 1 80 1 2 82 2 1 83 1 2 85 1 13 98 1 2 100 1 3 103 1 11 114 1 -.05 -.31 -.95 -.01 605 .01 -.27 -.96 -.0 608 XF-M gp3 0 1 11 11 1 4 15 1 1 16 1 13 29 1 1 30 1 2 32 2 2 34 1 1 35 2 4 39 1 8 47 2 1 48 2 9 57 1 1 58 1 1 59 1 2 61 1 2 63 1 2 65 1 1 66 1 1 67 1 1 68 5 1 69 2 1 70 1 3 73 3 1 74 3 1 75 1 1 76 2 1 77 2 2 79 1 3 82 1 1 83 1 1 84 1 1 85 1 1 86 1 1 87 1 1 88 1 1 89 1 1 90 4 1 91 2 1 92 7 1 93 1 1 94 5 1 95 4 1 96 2 1 97 3 1 98 2 1 99 2 1 100 3 1 101 4 1 102 7 1 103 3 1 104 2 1 105 6 1 106 3 1 107 5 1 108 9 1 109 6 1 110 4 1 111 5 1 112 4 1 113 4 1 114 1 _4L2H ___ _ [12,28) 1L2H _2L1H 2L 0H C1 -.11 -.02 -.86 .5 43 -.05 -.4 -.92 .01 68 C7F-M*3 g3 0 1 3 3 1 2 5 1 4 9 2 6 15 1 3 18 1 2 20 1 1 21 1 1 22 3 3 25 2 3 28 1 2 30 3 1 31 2 1 32 1 1 33 1 1 34 2 1 35 1 1 36 3 3 39 2 1 40 1 1 41 2 3 44 1 2 46 3 2 48 1 2 50 2 2 52 1 1 53 1 1 54 1 1 55 1 1 56 2 1 57 1 1 58 2 1 59 2 1 60 2 1 61 1 1 62 1 1 63 2 2 65 1 1 66 1 1 67 1 1 68 1 1 69 3 1 70 1 1 71 1 1 72 1 1 73 1 1 74 2 1 75 2 1 76 1 1 77 3 1 78 4 1 79 3 1 80 4 1 81 1 1 82 1 1 83 1 2 85 2 1 86 2 2 88 3 1 89 2 2 91 2 2 93 4 3 96 1 _0L 2H _0L 2H C2 3L5H -.08 .59 -.8 -.07 80 .08 .83 -.56 -.01 95 C5 g3 0 1 4 4 1 8 12 1 3 15 1 2 17 1 2 19 1 4 23 1 1 24 1 2 26 1 1 27 1 2 29 3 2 31 1 1 32 1 1 33 1 ___ _ [28,46) 2L6H 1L1H ___ _ [46,57) 2L2H .05 .59 -.293 .75 18 -.1 .9 -.3 .1 34 C6*8 16 0 1 4 4 2 16 20 1 11 31 1 37 68 1 15 83 1 15 98 1 8 106 1 11 117 1 1 118 2 _2L4H C71 C121 max thin 0 1 1 1 6 1 2 5 1 3 3 1 4 3 1 5 8 1 6 8 1 7 4 1 8 7 1 9 3 1 10 1 1 11 5 1 12 6 1 13 3 1 14 2 1 15 3 1 16 3 1 17 4 _2L5H C3 _0L 1H C4 _3L 0H C1 F-M Ct g3 0 1 1 1 2 1 2 2 3 5 1 1 6 1 1 7 4 1 8 2 2 10 2 1 11 1 2 13 2 1 14 1 1 15 1 1 16 5 2 18 1 2 20 2 3 23 1 1 24 1 1 25 1 1 26 2 2 28 1 1 29 1 1 30 5 1 31 2 1 32 1 1 33 4 1 34 5 1 35 4 1 36 4 1 37 2 1 38 3 1 39 3 1 40 2 1 41 4 1 42 3 1 43 5 1 44 3 1 45 4 1 46 5 1 47 4 1 48 3 1 49 11 1 50 5 1 51 3 1 52 5 1 53 4 1 54 4 1 55 1 _1L2H ___ 4L2H _0L 1H C4 _1L 2H ___ 0L 2H _2L 3L2H 23L 25H 6L 21H _1L6H 5L5H _2L 12H _9L 7H C5 .19 .8 -.54 .18 7 -.21 .7 -.7 -.09 9 C763F-M*8 g8 0 2 16 16 1 13 29 1 12 41 2 4 45 1 7 52 1 4 56 1 7 63 1 8 71 2 _1L4H .01 -.27 -.96 -.01 23 -.04 -.43 -.9 .03 24 C76*4 g3 0 1 31 31 1 3 34 1 1 35 2 2 37 1 2 39 2 2 41 1 2 43 1 1 44 1 2 46 3 3 49 1 1 50 1 1 51 2 1 52 1 1 53 2 2 55 2 2 57 2 3 60 1 2 62 2 1 63 1 2 65 3 1 66 2 3 69 1 1 70 1 1 71 2 2 73 2 1 74 1 1 75 2 1 76 3 1 77 2 1 78 2 1 79 3 1 80 3 2 82 1 2 84 1 2 86 2 1 87 1 1 88 1 2 90 2 1 91 2 3 94 2 2 96 2 1 97 2 C11 10L 13H C12 0L 2H ___ _1L 0H ___ _0L 1H _2L 0H [0.35) C11 38L68H C12 F-M gp2 0 1 1 1 8 1 2 3 1 3 2 1 4 4 1 5 11 1 6 8 1 7 2 1 8 6 1 9 4 1 10 3 1 11 4 1 12 4 1 13 5 1 14 3 1 15 3 1 16 4 2 18 2 1 19 5 1 20 6 1 21 4 1 22 1 1 23 2 1 24 3 1 25 3 3 28 2 1 29 2 2 31 1 4L 8H C6 _2L4H 4L8H C763 _0L 2H -.21 .34 -.91 .9 8 C766 *16 g4 0 1 30 30 1 2 32 1 7 39 1 1 40 1 1 41 1 1 42 1 4 46 1 2 48 1 2 50 2 5 55 1 3 58 1 7 65 1 2 67 1 5 72 1 3 75 2 2 77 1 1 78 4 2 80 1 3 83 1 1 84 2 4 88 1 1 89 1 11 100 1 4 104 1 11 115 1 _0L 1H [35,53) C12 10L13H ___ _ 2L9H _2L 0H ___ [53,56) 3L 2H _3L 1H 29L 46H ___ _ 1L8H 51L 83H [0.66) C1 _1L 3H ___ _ [66,75) 2L2H _2L 0H _2L 0H 7L 19H 2L2H ___ _ [75,98) 2L6H 0L 1H ___ [57,115) 51L 83H C1 _4L 8H ___ _ [98,115) 2L2H 17L 15H C766 _2L 0H 38L 68H C7 _0L 2H ___ 28L 44H C76 1L _1L 0H ___ _ 3L3H
SEEDS GV MVM 256 36 10 32 akk .98 .14 .04 .12 0 .00 -.00 .96 .29 3 C6 10(F-M) g12 0 3 10 10 1 12 22 3 10 32 3 9 41 2 7 48 1 ACCURACY SEEDS WINE GV 94 62.7 MVM 93.3 66.7 GM 96 81.3 219 31 14 29 akk d1 d2 d3 d4 V(d .98 .14 .06 .13 9 .98 .14 .06 .13 9 10(F-MN) gp6 0 2 1 1 10 1 2 5 1 3 1 6 9 3 1 10 10 1 11 10 1 12 2 6 18 2 1 19 3 1 20 7 1 21 2 1 22 1 1 23 3 6 29 6 1 30 4 1 31 7 1 32 1 6 38 1 1 39 2 1 40 6 1 41 5 1 42 1 7 49 3 1 50 1 2 52 7 1 53 2 7 60 1 2 62 4 1 63 3 8 71 5 1 72 2 2 74 1 6 80 5 1 81 8 1 82 5 1 83 3 9 92 2 10 102 1 1 103 2 1 104 1 10(F-MN)gp6 0 2 1 1 10 1 2 5 1 3 1 6 9 3 1 10 10 1 11 10 1 12 2 6 18 2 1 19 3 1 20 7 1 21 2 1 22 1 1 23 3 6 29 6 1 30 4 1 31 7 1 32 1 6 38 1 1 39 2 1 40 6 1 41 5 1 42 1 7 49 3 1 50 1 2 52 7 1 53 2 7 60 1 2 62 4 1 63 3 8 71 5 1 72 2 2 74 1 6 80 5 1 81 8 1 82 5 1 83 3 9 92 2 10 102 1 1 103 2 1 104 1 ___ ___ [0,9) 0k 0r 18c C1 ___ ___ [0,9) 0k 0r 18c C1 ___ ___ [9,18) 1k 0r 24c C2 GM ___ ___ [9,18) 1k 0r 24c C2 .794 -.403 -.304 .337 6 0.957 .156 -.205 .132 9 10(F-MN) gp3 0 1 2 2 1 2 4 4 2 6 3 2 8 7 2 10 2 2 12 1 2 14 1 2 16 10 2 18 10 1 19 2 3 22 2 1 23 2 1 24 1 1 25 1 2 27 4 2 29 4 2 31 4 2 33 2 5 38 3 1 39 3 2 41 7 2 43 2 2 45 2 1 46 1 2 48 1 1 49 1 1 50 4 2 52 5 1 53 1 1 54 3 3 57 2 2 59 3 2 61 3 1 62 1 2 64 3 2 66 3 3 69 5 7 76 1 2 78 2 2 80 2 2 82 4 2 84 1 2 86 1 2 88 4 1 89 1 1 90 8 2 92 5 11 103 2 1 104 1 1 105 1 1 106 1 2 108 1 ___ ___ [18,29) 10k 0r 8c C3 ___ ___ [18,29) 10k 0r 8c C3 ___ ___ [29,38) 18k 0r 0c C4 ___ ___ [29,38) 18k 0r 0c C4 ___ ___ [38,49) 13k2r 0c C5 -.577 .577 .577 .000 1 .119 .112 .986 .000 3 C2: 10(F-MN) gp10 0 1 10 10 2 1 11 3 10 21 3 10 31 5 10 41 1 10 51 1 11 62 1 1 63 1 ___ ___ [0,22) 0k 0r 42c C1 ___ ___ [38,49) 13k2r 0c C5 ___ ___ [49,60) 7k 6r 0c C6 ___ ___ [0,31) 9k 0r 0c C21 ___ ___ [49,60) 7k 6r 0c C6 ___ ___ [60,71) 1k 7r 0c C7 ___ ___ [31,41) 1k 0r 4c C22 ___ ___ [60,71) 1k 7r 0c C7 ___ ___ [71,80) 0k 8r 0c C8 ___ ___ [22,33) 10k 0r 8c C2 ___ ___ [41,64) 0k 0r 4c C23 ___ ___ [71,80) 0k 8r 0c C8 ___ ___ [80,92) 0k 21r 0c C9 ___ ___ [92,102) 0k 2r 0c Ca ___ ___ [80,92) 0k 21r 0c C9 ___ ___ [92,102) 0k 2r 0c Ca ___ ___ [102,105) 0k 4r 0c Cb C3 200(F-MN)gp12 0 2 12 12 3 12 24 3 12 36 5 12 48 1 12 60 1 12 72 1 40 112 2 ___ ___ [102,105) 0k 4r 0c Cb ___ ___ [33,57) 33k2r 0c C3 C3 .97 .15 .09 .14 0 0 .07 1 0 4 10F-M g9 0 2 10 10 3 10 20 3 10 30 4 1 31 1 9 40 1 10 50 1 11 61 1 9 70 2 ___ ___ [0,35) 8k 0r 0c ___ ___ [0,10) 2k 0r 0c ___ ___ [35,48) 2k 0r 3c ___ ___ [10,20) 2k 0r 1c -.832 -.282 .134 -.458 0 -.44 .00 -.87 -.22 2 C4: 10(F-MN) gp21 0 3 11 11 2 20 31 3 21 52 3 27 79 1 20 99 3 ___ ___ [20,30) 2k 0r 1c ___ ___ [48,72) 0k 0r 2c ___ ___ [57,69) 6k 9r 0c C4 ___ ___ [69,76) 1k4r 0c C6 ___ ___ [30,40) 4k 0r 1c ___ ___ [72,113) 0k 0r 3c ___ ___ [40,50) 0k 0r 1c ___ ___ [50,61) 0k 0r 1c ___ ___ [61,70) 0k 0r 1c ___ ___ [0,52) 1k7r C41 ___ ___ [70,71) 0k 0r 2c ___ ___ [52,79) 1k2r C42 C6 200(F-MN)gp12 0 3 12 12 1 38 50 3 10 60 1 2 62 3 12 74 2 ___ ___ [79100) 4k 0r C43 ___ ___ [0,50) 4k 0r 0c ___ ___ [50,60) 1k 0r 2c ___ ___ [76,103) 0k 26r 0c C7 ___ ___ [0,22) 4k 0r 0c ___ ___ [60,74) 1k 0r 3c ___ ___ [74,75) 1k 0r 1c ___ ___ [103,109) 0k 6r 0c C8 ___ ___ [22,49) 3k 6r 0c
MVM C2 3762 808 2260 266 d1 d2 d3 d4 .84 .18 .51 .06 64 .57 .22 .71 .34 82 .51 .22 .74 .38 83 (F-MN)*3 Ct gp3 0 1 2 2 1 1 3 1 2 5 1 15 20 2 3 23 1 3 26 2 2 28 1 1 29 1 2 31 1 2 33 2 2 35 2 2 37 1 1 38 3 1 39 1 1 40 1 1 41 1 1 42 1 4 46 1 1 47 2 2 49 1 1 50 1 1 51 1 2 53 1 1 54 2 2 56 1 2 58 1 1 59 2 2 61 2 1 62 2 1 63 3 1 64 1 1 65 2 2 67 3 1 68 2 1 69 1 1 70 2 1 71 2 1 72 2 2 74 1 1 75 1 2 77 1 1 78 1 1 79 1 2 81 1 2 83 1 1 84 1 3 87 2 1 88 1 1 89 1 1 90 2 1 91 1 1 92 2 3 95 1 1 96 2 1 97 1 2 99 1 2 101 2 2 103 1 3 106 3 3 109 1 1 110 1 1 111 1 .81 .28 -.28 .42 13... .53 .23 .73 .37 39 C12 4*F-M g3 0 2 4 4 1 4 8 2 2 10 1 2 12 1 2 14 1 3 17 1 1 18 1 2 20 1 1 21 1 1 22 1 2 24 3 1 25 1 2 27 1 1 28 1 2 30 1 4 34 1 2 36 2 2 38 1 3 41 2 3 44 1 2 46 2 2 48 1 2 50 1 2 52 2 2 54 1 1 55 1 1 56 1 1 57 1 1 58 3 1 59 1 1 60 2 2 62 1 1 63 4 2 65 1 1 66 1 1 67 1 1 68 1 1 69 3 3 72 1 2 74 1 2 76 1 2 78 1 1 79 1 3 82 1 2 84 1 1 85 1 4 89 1 1 90 1 2 92 1 1 93 1 IRIS GM GV ACCURACY IRIS SEEDS WINE GV 82.7 94 62.7 MVM 94 93.3 66.7 GM 94.7 96 81.3 C23 F-M*3 g3 3847 818 2284 257 .96 .22 .06 -.14 15 0 1 6 6 1 2 8 1 4 12 1 3 15 1 1 16 1 2 18 2 8 26 1 2 28 1 1 29 1 1 30 1 2 32 1 1 33 1 3 36 1 3 39 2 1 40 1 1 41 2 1 42 2 2 44 2 2 46 1 1 47 2 5 52 1 1 53 1 3 56 1 1 57 1 3 60 1 1 61 1 1 62 1 2 64 1 6 70 2 5 75 1 2 77 2 3 80 1 9 89 1 8 97 1 F-MN gp8 0 2 3 3 5 1 4 5 1 5 14 1 6 11 1 7 6 1 8 1 1 9 5 1 10 1 5 15 1 8 23 1 2 25 2 2 27 1 2 29 1 1.. 68 1 .88 .09 -.98 -.18 168 -.29 .13 -.88 -.36 417 -.36 .09 -.86 -.36 420 F-MN Ct gp5 0 1 3 3 2 1 4 1 2 6 1 1 7 1 2 9 2 1 10 1 2 12 3 1 13 1 1 14 3 1 15 4 1 16 2 1 17 3 1 18 1 1 19 6 1 20 3 1 21 1 1 22 2 1 23 2 1 24 6 1 25 7 1 26 2 1 27 3 1 28 2 1 29 6 1 30 3 1 31 2 1 32 3 1 33 3 1 34 3 1 35 3 1 36 5 1 37 1 1 38 2 1 39 1 1 40 2 1 41 1 2 43 1 2 45 1 1 46 1 1 47 1 5 52 1 8 60 2 1 61 3 1 62 4 1 63 3 1 64 13 1 65 12 1 66 4 1 67 5 1 68 2 2 70 2 .90 .24 .37 .04 180 .41 -.04 .84 .35 418 .36 -.08 .86 .36 420 F-MN Ct gp3 0 2 2 2 2 1 3 2 1 4 5 1 5 7 1 6 16 1 7 6 1 8 4 1 9 4 1 10 2 8 18 1 5 23 1 2 25 2 2 27 1 2 29 1 1 30 1 1 31 1 1 32 2 1 33 1 1 34 3 1 35 5 1 36 4 1 37 3 1 38 1 1 39 4 1 40 3 1 41 3 1 42 4 1 43 4 1 44 2 1 45 5 1 46 7 1 47 3 1 48 2 1 49 1 1 50 3 1 51 4 1 52 3 1 53 2 1 54 3 1 55 3 1 56 3 1 57 1 1 58 4 3 61 2 1 62 1 1 63 1 2 65 1 1 66 1 1 67 2 3 70 1 ___ 1e 0i -.36 .09 -.86 -.36 105 -.54 -0.17 -.76 -.33 118 C1 2*(F-M g3 0 2 4 4 1 1 5 1 1 6 1 5 11 1 2 13 1 3 16 1 2 18 1 3 21 1 1 22 1 1 23 1 2 25 2 1 26 2 2 28 2 1 29 1 2 31 3 1 32 1 2 34 1 1 35 2 1 36 2 1 37 4 3 40 2 1 41 1 2 43 3 2 45 1 2 47 4 1 48 1 1 49 2 1 50 4 1 51 3 2 53 5 1 54 2 1 55 2 1 56 1 1 57 3 2 59 3 2 61 2 2 63 1 1 64 1 1 65 2 2 67 1 1 68 1 1 69 2 1 70 2 1 71 3 1 72 1 1 73 2 2 75 1 1 76 1 1 77 1 1 78 1 1 79 1 1 80 1 2 82 2 10 92 1 2 94 2 2 96 1 __2e 5i 50s1i C1 C2 ___ 4e 1i C21 ___ 19e1i C22 4(F-) g4 0 1 6 6 1 4 10 1 2 12 1 4 ... 33 2 1 34 1 4 38 1 1 39 1 3 ... 79 1 2 81 1 5 86 1 2 88 2 2 90 1 1 91 1 1 92 2 2 94 1 1 95 1 2 97 1 1 98 1 3 101 2 1 102 2 4 106 1 1 107 1 2 109 1 1 110 2 1 111 2 6 117 1 1 118 1 1 119 1 1 120 1 ___50s1i C1 ___ 6e 0i ___ 18e C221 29e 14i ___ ___ ___ 19e1i C22 ___28i C11 ___ 16e11i 18e 11i C123 ___ 6e ___ 2e ___ 3e2i C221 8F- g5 0 1 7 7 1 4 11 1 5 16 1 1 17 1 3 20 1 1 21 1 2 23 1 1 24 1 5 29 1 3 32 2 2 34 1 1 35 1 4 39 3 5 44 1 3 47 2 3 50 1 3 53 1 4 57 1 3 60 1 3 63 1 1 64 2 5 69 2 1 70 1 3 73 1 1 74 1 1 75 1 4 79 1 1 80 2 2 82 2 1 83 1 1 84 1 2 86 1 4 90 1 5 95 1 ___1e ___ 0e 3i ___ 2e ___ 26i ___ 0e 4i C221 8F-)g5 0 1 7 7 1 4 11 1 5 16 1 1 17 1 3 20 1 1 21 1 2 23 1 1 24 1 5 29 1 3 32 2 2 34 1 1 35 1 4 39 3 5 44 1 3 47 2 3 50 1 3 53 1 4 57 1 3 60 1 3 63 1 1 64 2 5 69 2 1 70 1 3 73 1 1 74 1 1 75 1 4 79 1 1 80 2 2 82 2 1 83 1 1 84 1 2 86 1 4 90 1 5 95 1 ___50e 49i C1 __ 4e1i ___ 3e . -.034 .37 -.31 .87 4 C123 12*F-M g4 0 1 6 6 1 10 16 1 2 18 1 3 21 1 1 22 1 1 23 1 6 29 1 3 32 1 3 35 1 5 40 2 5 45 1 4 49 1 1 50 2 4 54 1 2 56 1 5 61 2 1 62 1 2 64 1 1 65 1 2 67 1 3 70 1 1 71 1 12 83 1 1 84 1 1 85 1 __ 1i . ___ 50e 40i C2 9i C3 ___ 1e . _46e 21i C12 ___9e ___ 5e 1i ___ 4e C13 ___ 27e 16i C23 ___9e . ___ 50s1i C2 ___ 9e1i . __9e2i MVM C2 2(F-)g4 0 1 4 4 1 1 5 1 4 9 1 3 ... 69 1 4 73 1 1 74 1 2 76 2 4 80 1 4 84 1 2 86 2 5 91 1 ___ 9i C24 _ 4e . __9e2i ___ 3e __ 0e 2i . 47e 40i C22 ___ 8i ___ 3i ___ 2e6i . ___ 5e10i ___ _3i ___ 1i ___ 0e 11i ___ 2e1i ___ 5e11i
CONCRETE GM MVM C11 F-/4 g4 0 4 2 2 1 2 4 4 2 6 25 2 8 2 1 9 7 1 10 4 1 11 9 2 13 3 1 14 6 1 15 4 1 16 1 3 19 5 4 23 2 3 26 5 1 27 4 1 28 9 1 29 5 2 31 6 1 32 5 3 35 6 5 40 2 C232 g2 F-M/8 0 1 1 1 1 1 2 2 1 3 1 2 5 2 1 6 1 1 7 2 1 8 2 1 9 1 7 16 1 1 17 3 1 18 2 2 20 7 1 21 8 1 22 7 1 23 1 2 25 2 1 26 3 1 27 2 1 28 3 1 29 1 1 30 1 1 31 2 2 33 1 1 34 4 1 35 3 3 38 3 1 39 8 11 50 2 1 51 1 MVM (F-)/4 gp4 C23 g3 F-M/8 0 2 2 2 1 1 3 3 1 4 3 1 5 1 1 6 1 1 7 6 1 8 1 1 9 8 1 10 2 1 11 6 1 12 2 1 13 5 1 14 2 1 15 2 3 18 1 1 19 7 1 20 1 1 21 3 1 22 1 1 23 2 1 24 4 1 25 1 2 27 8 1 28 9 1 29 4 2 31 2 1 32 1 1 33 3 1 34 3 2 36 7 1 37 12 2 39 1 1 40 1 1 41 1 1 42 6 6 48 1 2 50 2 0L 32M 13H 11L 13M 54H ACCURACY CONCRETE IRIS SEEDS WINE GV 76 82.7 94 62.7 MVM 78.8 94 93.3 66.7 GM 83 94.7 96 81.3 C2-.6 .2 -.07 .771 6882.. -.72 .19 -.40 .54 9251 .38 .14 -.79 .46 11781 F-m/8 g4 C2 0 1 2 2 1 1 3 1 2 5 2 3 8 1 2 10 1 1 11 1 5 16 1 2 18 1 5 23 1 1 24 1 1 25 2 1 26 2 1 27 1 2 29 4 1 30 2 1 31 2 1 32 1 1 33 3 2 ... 1s 65 1 X g4 (F-MN)/8 0 2 2 2 1 2 4 2 1 5 1 3 8 2 3 11 1 1 12 3 2 14 4 1 15 3 1 16 3 1 17 2 1 18 3 1 19 6 1 20 3 1 21 3 1 22 2 1 23 5 1 24 4 1 25 3 1 26 6 1 27 3 1 28 1 1 29 6 1 30 3 1 31 2 1 32 3 1 33 3 1 34 1 2 36 3 1 37 1 1 38 2 1 39 3 1 40 5 1 41 1 1 42 6 1 43 1 1 44 3 2 46 5 1 47 1 1 48 3 1 49 1 1 50 2 1 51 1 1 52 1 1 53 1 1 54 1 1 55 1 1 56 3 1 57 3 2 59 1 2 61 1 1 62 3 3 65 2 9 74 1 4 78 1 3 81 1 2 83 1 3 86 1 2 88 1 2 90 1 1 91 1 4 95 1 2 97 1 1 98 1 2 100 1 4 104 1 3 107 1 0 1 1 1 1 4 5 1 1 ... 1s 46 4 3 49 1 7 56 1 2 58 1 3 61 1 4 65 1 1 66 1 3 69 1 2 71 1 6 77 1 3 80 1 3 83 1 3 86 1 14 100 1 3 103 1 2 105 1 3 108 2 4 112 1 ___ 2M C2 gp8 (F-MN)/5 0 2 2 2 1 2 4 2 1 5 1 3 8 2 3 11 1 1 12 2 2 14 4 1 15 3 1 16 3 1 17 2 1 18 3 1 19 6 1 20 3 1 21 3 1 22 1 1 23 5 1 24 3 1 25 3 1 26 6 1 27 3 1 28 1 1 29 6 1 30 3 1 31 2 1 32 1 1 33 3 1 34 1 2 36 3 2 38 2 1 39 2 1 40 5 1 41 1 1 42 6 1 43 1 1 44 3 2 46 5 1 47 1 1 48 1 1 49 1 1 50 2 1 51 1 1 52 1 1 53 1 1 54 1 1 55 1 1 56 3 1 57 2 2 59 1 2 61 1 1 62 3 3 65 2 9 74 1 4 78 1 8 86 1 2 88 1 2 90 1 5 95 1 2 97 1 1 98 1 2 100 1 4 104 1 C21 0L 8M 0H C1 43L 33M 55H C22 2M 0H C23 C211 g5 F-M)/4 0 1 6 6 2 1 7 2 5 12 1 1 13 4 1 14 1 1 15 4 2 17 1 1 18 2 1 19 2 2 21 2 1 22 3 1 23 1 1 24 3 4 28 1 14 42 1 2 44 1 1 45 1 3 48 2 2 50 1 5 55 1 2 57 1 1 58 1 5 63 1 1 64 1 7 71 1 11 82 1 16 98 2 g4 F-MN/8 0 1 2 2 1 2 4 1 2 6 1 1 7 1 1 8 1 2 10 1 1 11 1 1 12 1 1 13 1 3 16 2 3 19 1 2 21 1 5 26 1 1 27 1 1 28 2 1 29 2 1 30 1 2 32 5 1 33 2 1 34 2 1 35 1 1 36 3 1 37 3 1 38 3 1 39 5 1 40 3 1 41 7 1 42 6 1 43 3 1 44 5 1 45 1 1 46 3 1 47 3 1 48 4 1 49 7 1 50 4 1 51 6 1 52 10 1 53 3 1 54 4 1 55 8 1 56 5 1 57 3 1 58 7 1 59 2 1 60 2 1 61 1 1 62 2 2 64 1 1 65 2 1 66 1 1 67 2 C21 g4 F-M/4 0 1 1 1 1 3 4 1 3 7 2 1 8 2 1 9 1 2 11 1 2 13 4 1 14 2 1 15 4 1 16 1 2 18 2 1 19 3 1 20 1 1 21 2 1 22 6 2 24 2 1 25 3 1 26 1 2 28 2 2 30 1 1 31 1 2 33 1 4 37 1 1 38 2 1 39 2 1 40 1 1 41 1 1 42 1 1 43 2 1 44 1 1 45 2 1 46 1 1 47 1 1 48 1 1 49 2 2 51 2 4 55 1 1 56 8 1 57 4 1 58 4 1 59 2 1 60 1 1 61 1 2 63 5 2 65 1 2 67 2 1 68 1 3 71 1 1 72 4 1 73 8 1 74 5 1 75 1 8 83 3 1 84 3 1 85 2 1 86 1 99 3 GV ___5L . C111 3L 23M 49H ___ 7M C2 ___ 4M C3 ___6M C4 ___ 30L 1M 4H C231 g4 F-M/8 0 1 7 ... 1s 12 1 2 14 6 1 15 7 4 19 1 1 20 3 1 21 3 1 22 2 1 23 1 2 25 1 2 27 1 2 29 1 1 30 1 1 31 1 2 33 1 6 39 1 3 42 1 4 46 1 10 56 2 __20L5M . C1F-/4 g4 ___14M 0H C1 C2 0 1 1 1 1 7 8 1 4 12 1 4 16 1 2 18 1 2 20 2 1 21 2 2 ... 1s+2s 71 2 2 73 1 1 74 1 2 76 2 2 78 2 4 82 2 2 84 1 6 90 2 8 98 1 9 107 1 16 123 1 ___ 5L1M . ___ 4M . ___ 2L1M . C211 32L 13M 0H ___5L1M C11 43L 23M 53H _30L8H_ . 3L2M C212 g5 F-M/3 0 1 20 20 1 8 28 1 1 29 2 9 38 1 11 49 1 5 54 1 11 65 1 10 75 2 3 78 1 11 89 1 7 96 1 2 98 1 2 100 1 11 111 2 1 112 1 ___6M2H C212 7L 3M 10H 2L2M1H __6L3M . __1L2H C111 F-/4 g4 0 1 16 16 3 1 17 2 1 18 9 1 19 3 2 21 5 6 27 3 1 28 5 1 29 14 1 30 1 8 38 2 2 40 15 1 41 3 4 45 3 2 47 2 19 66 3 21 87 1 ___1L4M3H ___ __1L ___1L 1M4H ___ 8H 43L 38M 55H C2 0L 14M 0H C1 ___ 3L 2M18H 1L 21M 43L 28M 55H C21 __ 1L 2M 20H C213 4L 7M 38H ___4L 2M8H ___ 8H C214 0L 5M 7H ___ 2M9H ___ ___ . 1H 2M 0L 10M 0H C22 ___ __ 31H ___1L2H
ABALONE GV 0.11 0.09 0.03 0.14 2 0.27 0.86 0.33 0.27 73 1.00 0.00 0.00 0.00 5 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 0.00 1.00 0.00 0.00 56 0.25 0.88 0.31 0.25 73 0.00 0.00 1.00 0.00 8 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 0.00 0.00 0.00 1.00 5 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 1.00 1.00 0.00 0.00 93 0.26 0.87 0.32 0.26 73 1.00 0.00 1.00 0.00 27 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 1.00 0.00 0.00 1.00 22 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 1.00 1.00 1.00 0.00 154 0.27 0.87 0.33 0.27 73 1.00 1.00 0.00 1.00 141 0.26 0.87 0.33 0.26 73 1.00 0.00 1.00 1.00 57 0.29 0.84 0.36 0.29 72 0.26 0.87 0.32 0.26 73 0.00 1.00 1.00 1.00 154 0.27 0.87 0.33 0.27 73 1.00 1.00 1.00 1.00 216 0.27 0.86 0.33 0.27 73 GM MVM 1.00 0.00 0.00 0.00 23 0.71 0.23 0.66 0.01 47 C1 g3 400*F-M 0 1 1 1 1 6 7 1 3 10 2 2 12 3 2 14 3 1 15 1 3 18 1 2 20 1 2 22 3 4 26 1 3 29 1 3 32 1 1 33 1 2 35 1 2 37 2 2 39 1 1 40 1 5 45 2 2 47 1 1 48 2 1 49 1 2 51 1 1 52 2 1 53 2 1 54 2 2 56 1 2 58 3 1 59 1 1 60 1 2 62 2 1 63 1 1 64 2 3 67 4 1 68 1 1 69 2 1 70 1 3 73 1 2 75 2 1 76 2 2 78 1 1 79 2 2 81 1 1 82 1 1 83 1 1 ... 97 1 ACR CONC IRIS SEEDS WINE ABAL GV 76 83 94 63 73 MVM 79 94 93 67 79 GM 83 95 96 81 81 0.39 0.57 0.10 -0.72 0.21 0.57 0.44 0.09 -0.69 0.24 0.77 0.61 0.17 0.01 2.19 0.58 0.48 0.17 0.64 3.8 0.55 0.46 0.16 0.68 3.81 g3 200*F-M 0 1 11 11 1 14 25 1 17 42 1 1 43 1 5 48 1 3 51 1 2 ... 67 2 1 68 2 1 69 3 2 ... 1s 92 1 1H 1M _ 1H X g2 100(F-M) 3 2 3 6 1 2 8 1 1 9 2 3 12 1 3 15 2 1 16 1 2 18 2 1 19 1 1 20 2 1 21 3 1 22 2 1 23 1 1 24 6 1 25 1 1 26 1 2 28 3 1 29 2 1 30 2 2 32 3 1 33 2 1 34 3 1 35 5 1 36 4 1 37 4 1 38 3 1 39 5 1 40 3 1 41 2 1 42 1 1 43 2 1 44 3 1 45 4 1 46 2 1 47 3 1 48 3 1 49 1 1 50 3 1 51 1 1 52 1 1 53 7 1 54 4 1 55 3 1 56 3 1 57 4 1 58 2 1 59 1 1 60 3 1 61 4 1 62 2 2 64 2 1 65 1 1 66 1 2 68 3 1 69 2 1 70 1 4 74 1 2 76 1 3 79 2 1 80 2 3 83 2 2 85 1 4 89 1 13 102 1 0.25 0.30 -0.20 -0.90 0.18 -0.44 -0.37 -0.19 -0.79 0.81 -0.52 -0.42 -0.19 -0.72 0.83 C1 g3 300(F-M) 0 1 1 1 1 2 3 2 1 4 1 1 5 1 1 6 2 1 7 1 3 10 1 1 11 1 3 14 3 2 16 2 1 17 1 1 18 2 2 20 1 2 22 1 1 23 2 1 24 1 1 25 2 1 26 3 1 27 1 1 28 2 1 29 1 2 31 1 1 32 1 3 35 1 1 36 1 2 38 1 3 41 1 3 44 3 1 45 1 1 46 2 2 48 1 1 49 1 1 50 2 2 52 2 1 53 1 1 54 1 1 55 1 4 59 2 1 60 1 4 64 1 1 65 1 1 66 1 1 67 2 2 69 2 1 70 2 1 71 2 2 73 1 1 74 1 1 75 2 1 76 2 1 77 1 1 78 3 2 80 1 1 81 3 2 83 2 1 84 1 1 85 1 1 86 1 2 88 1 1 89 1 1 90 1 2 92 1 2M 1H _ 5M12H _ 6L . 1M _ 3L . 30L 85M 12H C1 C1 g3 100*F-M 0 1 6 6 1 1 ... 1s 54 1 2 56 2 3... 71 2 7M 4H . 1H 20L 84M 11H C11 10L1M 0H 12L 7M _ 3L4M _ C11 g3 400*F-M 0 1 1 1 1 4 5 1 3 8 4 1 9 1 3 12 2 2 .. 81 2 3 84 2 1 85 1 2M 1H _ 4M 1H _ 2L 0M 0H _ 1L19M1H _ 16M 8H C11 17L 78M 9H C111 3L 1.0 .00 .00 .00 10 .62 .41 .13 .65 46 .33 .29 .13 .89 56 C2 g3 300*F-M 0 1 8 8 1 1 9 1 2 11 1 1 12 1 1 13 3 1 14 1 2 16 2 1 17 1 1 18 3 2 20 2 1 21 1 3 24 1 1 25 1 2 27 2 1 28 1 1 29 2 1 30 1 1 31 1 2 33 2 1 34 1 1 35 1 2 37 1 1 38 3 1 39 1 1 40 1 5 45 1 1 46 1 2 48 1 6 54 1 4 58 1 1 59 1 3 62 1 1 63 1 1 64 1 4 68 1 1 69 1 14 83 1 3 86 1 23 109 1 7L 3M 0H _ C111 g3 1500*F-M 0 1 15 15 1 5 20 1 4 24 1 1 25 1 1 26 1 3 29 1 1 30 1 1 31 2 1 32 1 1 33 2 3 36 1 2 38 3 1 39 2 2 41 2 1 42 1 1 43 2 2 45 1 2 47 3 1 48 1 2 50 1 1 51 1 4 55 2 1 56 3 2 58 1 2 60 3 1 61 2 1 62 2 2 64 1 1 65 2 3 68 2 1 ... 112 1 4 116 2 .55 .43 .14 .27 .38 C11 g3 1000(F-M) 0 1 10 10 1 7 17 1 2 19 1 8 27 1 9 36 1 11 47 2 2 49 1 3 52 2 4 56 1 4 60 1 2 62 1 2 64 1 7 71 3 1 72 1 5 77 2 4 81 1 3 84 1 6 90 1 3L _ 3M_ 6L8M 0H _ 17M 2H . 13M 5H _ 1M 2H _ 4L 3M _ 0M 6H _ 1M 2H _ 4L 72M 15H C1 10L1M 0H 3M 1H _ 2L21M1H _ 12M 7H _ 3L13M2H 15H _ 1L7M _ 5M 10H _ 1M _ 4L 8M4H 1H 6M 5H _ 3L 30M1H 1M 1H _ 1H 3L 51M3H
KOSblogs d=UnitSTDVec g>6*avg GV on 22 highest STD KOS wds d=(.46 .16 .03 .32 .71 .07 .06 .03 .09 .03 .10 .10 .19 .04 .16 .14 .01 .02 .04 .02 .00 .02) d=e841 (highest STD). gp=1 Ct=8 C16 . outliers. Some of them are substantial MVM gaps>6*avg DOC W=841 1716 0 ... ... 1379 C0 2427 0 Doc F=DPPd Gap 24=MxGp 2682 0 2749 7.574 0.038 0 3029 2983 8.436 0.079 0 42 3402 8.629 0.052 0 2 864 9.184 0.053 0 10 2293 9.462 0.106 1 4 2994 13.45 0.055 0 316 1445 13.66 0.029 0 4 3399 14.05 0.099 0 6 185 14.21 0.156 1 1 2731 14.35 0.143 1 1 2948 14.65 0.066 0 5 1495 14.99 0.014 0 2 804 15.20 0.205 1 1 3177 15.42 0.034 0 6 1316 15.61 0.024 0 2 1335 16.01 0.028 0 3 1637 16.35 0.330 1 1 880 16.86 0.039 0 3 1509 17.03 0.176 1 1 2885 17.21 0.177 1 1 446 18.07 0.863 1 1 1197 18.65 0.005 0 4 3189 19.30 0.644 1 1 1252 20.65 1.352 1 1 2750 13 54 13 2293 13 183 13 2870 13 1222 13 3217 13 1519 13 8 C13 1027 1 ... ... 3427 1 743 C1 2164 14 otlrs 1656 14 3244 14 1709 14 185 15 otlrs 401 15 414 15 893 15 2731 16 otlrs 1396 16 3220 16 3190 16 1832 17 otlr 2852 18 otlrs 3201 18 1234 18 3189 19 otlr 1524 22 otlr 1529 24 otlr 1197 25 otlr 201 27 otlr 1150 29 otlr 1335 34 otlr 1 2 ... ... 2519 2 470 C2 868 3 ... ... 3224 3 274 C3 1882 4 ... ... 3257 4 175 C4 1434 5 ... ... 910 5 127 C5 Cluster size: d=USTDMVM 10 7 11 8 15 8 16 9 17 11 21 11 27 12 42 30 48 45 68 87 422 502 2667 2613 GV 3 3 4 4 4 5 6 6 10 42 316 3029 2753 6 ... ... 549 6 75 C6 1186 7 ... ... 1015 7 79 C7 503 8 ... ... 3156 8 43 C8 2971 9 ... ... 2182 9 39 C9 2868 10 ... ... 1316 10 32 C10 2648 11 ... ... 336 11 18 C11 2983 12 ... ... 3177 12 14 C12 3364 1804 185.38 0.56 0 3365 3399 186.38 1.00 1 3366 980 186.68 0.30 0 3367 1518 187.84 1.15 1 3368 2090 188.45 0.61 1 3369 890 189.10 0.65 1 3370 24 189.74 0.65 1 3371 2435 189.77 0.03 0 3372 804 190.14 0.36 0 3373 930 190.24 0.11 0 3374 1096 191.30 1.06 1 3375 1441 191.39 0.09 0 3376 2885 191.86 0.47 0 3377 2315 191.91 0.05 0 3378 699 192.04 0.13 0 3379 2108 194.34 2.30 1 3380 1316 195.58 1.24 1 3381 991 195.85 0.27 0 3382 1564 196.05 0.20 0 3383 2800 196.37 0.32 0 3384 880 196.62 0.25 0 3385 2038 196.75 0.13 0 3386 481 197.09 0.34 0 3387 480 197.85 0.76 1 3388 295 198.38 0.53 0 3389 1234 200.42 2.04 1 3390 2140 201.46 1.04 1 3391 3353 202.36 0.90 1 3392 3402 202.64 0.28 0 3393 45 202.86 0.21 0 3394 3017 204.63 1.77 1 3395 3365 207.54 2.91 1 3396 2436 207.77 0.24 0 3397 553 209.73 1.96 1 3398 2545 210.52 0.79 1 3399 54 213.63 3.11 1 3400 1933 214.58 0.95 1 3401 3201 216.16 1.57 1 3402 2895 217.18 1.02 1 3403 446 217.83 0.65 1 3404 2302 218.43 0.61 1 3405 2873 219.47 1.04 1 3406 3388 223.00 3.52 1 3407 1509 225.98 2.99 1 3408 32 229.46 3.48 1 3409 3189 231.30 1.84 1 3410 3228 231.43 0.13 0 3411 2107 232.39 0.96 1 3412 1150 232.79 0.40 0 3413 2279 236.69 3.90 1 3414 2289 237.43 0.74 1 3415 2385 238.03 0.60 0 3416 1037 245.93 7.90 1 3417 201 246.72 0.79 1 3418 1252 249.23 2.51 1 3419 1739 250.34 1.11 1 3420 2446 257.59 7.26 1 3421 1637 258.64 1.05 1 3422 3220 260.55 1.91 1 3423 1304 262.67 2.12 1 3424 2355 271.20 8.53 1 3425 232 293.86 22.66 1 3426 3411 299.23 5.37 1 3427 1955 303.42 4.19 1 3428 1832 328.03 24.61 1 3429 1197 335.83 7.81 1 3430 2852 364.01 28.18 1 AvgGp.0085 gp>6*avg ROW KOS F GAP CT 1 1791 0.2270 --- -- 2 1317 0.2920 0.065 1 2668 1602 6.6576 0.007 2667 3090 1390 9.8504 0.004 422 3132 1546 10.278 0.012 42 3148 2662 10.507 0.021 16 3216 505 11.289 0.019 68 3264 2219 11.994 0.027 48 3291 231 12.445 0.039 27 3302 710 12.631 0.038 11 3317 220 12.934 0.023 15 3338 405 13.315 0.028 21 3355 194 13.693 0.009 17 3368 12 14.151 0.078 8 3378 2731 14.590 0.011 10 3392 1096 15.459 0.022 5 0.1=AvgGp 64=#gaps Row#Doc#F 28.2=MxGp .6=GapThreshold 1 1791 5.67 Gap 0 ... ... ... ... ... 8 3389 7.00 0.19 0 9 2397 7.65 0.65 1 10 2841 7.82 0.17 0 ... ... ... ... ... 2621 2334 89.40 0.06 0 2622 1122 90.00 0.60 1 2623 245 90.06 0.06 0 ... ... ... ... ... 3123 3169 132.06 0.00 0 3124 321 132.81 0.75 1 3125 2047 133.05 0.24 0 ... ... ... ... ... 3210 343 145.29 0.37 0 3211 2475 145.89 0.60 1 3212 458 146.10 0.21 0 ... ... ... ... ... 3240 542 151.15 0.09 0 3241 2569 151.76 0.61 1 3242 1143 151.92 0.15 0 ... ... ... ... ... 3285 1803 157.97 0.00 0 3286 2257 158.70 0.73 1 3287 2723 158.77 0.07 0 ... ... ... ... ... 3293 129 159.56 0.32 0 3294 2541 160.45 0.89 1 3295 2870 160.48 0.03 0 ... ... ... ... ... 3301 401 161.38 0.04 0 3302 2918 162.03 0.65 1 3303 100 162.07 0.04 0 ... ... ... ... ... 3312 1157 164.54 0.08 0 3313 185 165.26 0.72 1 3314 685 165.91 0.65 1 3315 2948 166.25 0.34 0 ... ... ... ... ... 3325 190 168.59 0.37 0 3326 2498 169.20 0.61 1 3327 264 169.31 0.11 0 3328 1611 169.64 0.33 0 3329 3052 169.96 0.32 0 3330 1002 170.43 0.47 0 3331 1628 170.64 0.20 0 3332 1241 171.80 1.16 1 3333 3155 172.00 0.20 0 ... ... ... ... ... 3342 861 173.84 0.15 0 3343 2509 174.98 1.13 1 3344 2293 175.65 0.67 1 3345 1257 175.67 0.02 0 3346 2776 176.04 0.37 0 3347 1422 177.15 1.11 1 3348 12 177.24 0.09 0 3349 183 177.26 0.02 0 3350 620 177.29 0.03 0 3351 679 179.08 1.79 1 3352 462 179.15 0.07 0 3353 3404 180.02 0.88 1 3354 1850 180.79 0.76 1 3355 3342 181.21 0.43 0 3356 1396 183.04 1.82 1 3357 2982 183.26 0.22 0 ___ ___ gap=.65 Ct=9 C1 ___ ___ gap=.6 Ct=2613 C2 ___ ___ gap=.75 Ct= 502 C3 ___ ___ gap=.6 Ct= 87 C4 ___ ___ gap=.61 Ct=30 C5 ___ ___ gap=.73 Ct=45 C6 ___ ___ gap=.89 Ct=8 C7 ___ ___ gap=.65 Ct=8 C8 ___ ___ gp=.72 Ct= 11 C9 ___ ___ gp=.65 Ct=1 outlr ___ ___ gp=.61 Ct=12 C11 ___ ___ gp=1.2 Ct=6 C12 ___ ___ gp=1.1 Ct=11 C13 ___ ___ gap=.67 Ct=1 utlr ___ ___ gp=1.1 Ct=3 C15 ___ ___ gp=1.8 Ct=4 C16 ___ ___ gp=1.8 Ct=5 otl;r
GV using a grid (Unitized Corners of Unit Cube + Diagonal of the Variance Matrix + Mean-to-Vector_of_Medians) UCUC(1101) UCUC(1011) UCUC(0111) 0.58 0.58 0.00 0.58 6756 0.69 0.10 -0.43 0.57 11945 0.65 0.10 -0.58 0.48 12599 0.60 0.11 -0.66 0.45 12784 0.55 0.12 -0.70 0.45 12864 0.51 0.13 -0.72 0.45 12908 0.49 0.13 -0.73 0.46 12933 0.46 0.14 -0.74 0.46 12947 0.45 0.14 -0.75 0.47 12956 0.43 0.14 -0.76 0.47 12960 0.42 0.14 -0.76 0.47 12963 0.42 0.14 -0.76 0.48 12965 0.41 0.15 -0.76 0.48 12966 0.58 0.00 0.58 0.58 6414 0.82 -0.10 0.46 0.33 8390 0.93 -0.12 0.32 0.12 9506 0.97 -0.11 0.20 0.02 9889 0.99 -0.10 0.11 -0.00 10069 1.00 -0.08 0.02 0.01 10254 0.99 -0.06 -0.08 0.05 10508 0.98 -0.04 -0.18 0.11 10851 0.94 -0.01 -0.29 0.18 11263 0.89 0.02 -0.40 0.24 11695 0.82 0.05 -0.49 0.30 12084 0.75 0.07 -0.56 0.35 12391 0.68 0.09 -0.62 0.38 12609 0.62 0.10 -0.66 0.41 12751 0.57 0.12 -0.69 0.43 12839 0.53 0.12 -0.71 0.44 12892 0.50 0.13 -0.73 0.45 12924 0.47 0.13 -0.74 0.46 12942 0.45 0.14 -0.75 0.47 12953 0.44 0.14 -0.75 0.47 12959 0.43 0.14 -0.76 0.47 12962 0.42 0.14 -0.76 0.47 12964 0.41 0.14 -0.76 0.48 12965 0.41 0.15 -0.76 0.48 12966 0.00 0.58 0.58 0.58 3102 -0.15 0.02 0.71 0.68 5237 -0.34 -0.08 0.86 0.37 7997 -0.46 -0.12 0.88 -0.09 11648 -0.47 -0.13 0.81 -0.33 12756 -0.45 -0.14 0.77 -0.42 12928 -0.44 -0.14 0.76 -0.45 12955 -0.43 -0.14 0.76 -0.47 12962 -0.42 -0.14 0.76 -0.47 12964 -0.41 -0.14 0.76 -0.48 12965 -0.41 -0.15 0.76 -0.48 12966 CONC d1 d2 d3 d4 VAR UCUC(1000) UCUC(0100) UCUC(0010) UCUC(0001) UCUC(1100) UCUC(1001) UCUC(0110) UCUC(0101) UCUC(0011) -0.06 -0.19 0.83 -0.52 12619 -0.14 -0.18 0.82 -0.52 12758 -0.20 -0.17 0.82 -0.51 12843 -0.25 -0.17 0.81 -0.51 12895 -0.28 -0.16 0.80 -0.50 12925 -0.31 -0.16 0.79 -0.50 12943 -0.33 -0.16 0.79 -0.49 12953 -0.35 -0.16 0.79 -0.49 12959 -0.36 -0.15 0.78 -0.49 12962 -0.37 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.48 12965 -0.38 -0.15 0.78 -0.48 12966 -0.38 -0.15 0.77 -0.48 12967 0.71 0.00 0.00 0.71 9105 0.78 0.05 -0.32 0.53 11499 0.74 0.07 -0.50 0.44 12306 0.68 0.09 -0.60 0.42 12601 0.62 0.10 -0.65 0.42 12753 0.57 0.12 -0.69 0.43 12841 0.53 0.12 -0.71 0.45 12894 0.50 0.13 -0.73 0.45 12924 0.47 0.13 -0.74 0.46 12942 0.45 0.14 -0.75 0.47 12953 0.44 0.14 -0.75 0.47 12959 0.43 0.14 -0.76 0.47 12962 0.42 0.14 -0.76 0.47 12964 0.41 0.14 -0.76 0.48 12965 0.41 0.15 -0.77 0.48 12966 0.40 0.15 -0.77 0.48 12967 0.00 0.71 0.71 0.00 3491 -0.19 -0.13 0.94 -0.25 12162 -0.25 -0.17 0.86 -0.41 12806 -0.28 -0.16 0.82 -0.47 12915 -0.31 -0.16 0.80 -0.49 12942 -0.33 -0.16 0.79 -0.49 12953 -0.35 -0.16 0.79 -0.49 12959 -0.36 -0.15 0.78 -0.49 12963 -0.37 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.48 12966 0.00 0.71 0.00 0.71 4926 0.01 0.20 -0.54 0.81 11209 0.09 0.18 -0.73 0.65 12473 0.16 0.18 -0.79 0.56 12765 0.22 0.17 -0.80 0.53 12861 0.26 0.17 -0.80 0.51 12907 0.29 0.16 -0.80 0.50 12932 0.32 0.16 -0.79 0.50 12947 0.34 0.16 -0.79 0.49 12955 0.35 0.15 -0.78 0.49 12960 0.36 0.15 -0.78 0.49 12963 0.37 0.15 -0.78 0.49 12965 0.37 0.15 -0.78 0.48 12966 0.00 0.00 0.71 0.71 4951 -0.06 -0.09 0.89 0.45 6835 -0.16 -0.15 0.97 -0.02 10755 -0.23 -0.17 0.90 -0.33 12547 -0.28 -0.16 0.84 -0.44 12876 -0.31 -0.16 0.81 -0.48 12934 -0.33 -0.16 0.80 -0.49 12951 -0.34 -0.16 0.79 -0.49 12958 -0.35 -0.15 0.78 -0.49 12962 -0.36 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.49 12965 -0.38 -0.15 0.78 -0.48 12966 UCUC(1111) akk MVM 0.50 0.50 0.50 0.50 4385 0.83 -0.04 0.32 0.46 8393 0.95 -0.06 0.09 0.28 9943 0.97 -0.04 -0.09 0.20 10663 0.95 -0.01 -0.24 0.21 11151 0.90 0.01 -0.36 0.25 11601 0.83 0.04 -0.47 0.30 12007 0.76 0.07 -0.55 0.34 12334 0.69 0.09 -0.61 0.38 12569 0.63 0.10 -0.65 0.41 12726 0.58 0.11 -0.69 0.43 12824 0.54 0.12 -0.71 0.44 12883 0.50 0.13 -0.73 0.45 12918 0.48 0.13 -0.74 0.46 12939 0.46 0.14 -0.75 0.46 12951 0.44 0.14 -0.75 0.47 12958 0.43 0.14 -0.76 0.47 12962 0.42 0.14 -0.76 0.47 12964 0.41 0.14 -0.76 0.48 12965 0.41 0.15 -0.76 0.48 12966 0.17 0.05 0.98 0.01 9327 0.06 -0.19 0.93 -0.30 11888 -0.04 -0.19 0.88 -0.44 12502 -0.12 -0.18 0.84 -0.49 12715 -0.19 -0.18 0.83 -0.50 12822 -0.24 -0.17 0.81 -0.50 12882 -0.27 -0.17 0.80 -0.50 12918 -0.30 -0.16 0.80 -0.50 12939 -0.32 -0.16 0.79 -0.49 12951 -0.34 -0.16 0.79 -0.49 12958 -0.35 -0.15 0.78 -0.49 12962 -0.36 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.49 12965 -0.38 -0.15 0.78 -0.48 12966 0.00 -0.00 0.00 -0.01 1 0.28 -0.19 0.49 -0.80 10378 0.18 -0.20 0.71 -0.65 11773 0.06 -0.20 0.79 -0.58 12296 -0.04 -0.19 0.82 -0.54 12563 -0.12 -0.18 0.82 -0.53 12724 -0.19 -0.18 0.82 -0.52 12823 -0.24 -0.17 0.81 -0.51 12883 -0.27 -0.17 0.80 -0.50 12918 -0.30 -0.16 0.80 -0.50 12939 -0.33 -0.16 0.79 -0.49 12951 -0.34 -0.16 0.79 -0.49 12958 -0.35 -0.15 0.78 -0.49 12962 -0.36 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.49 12965 -0.38 -0.15 0.78 -0.48 12966 1.00 0.00 0.00 0.00 10249 0.99 -0.05 -0.11 0.06 10585 0.97 -0.03 -0.21 0.13 10947 0.93 -0.00 -0.32 0.19 11370 0.87 0.03 -0.42 0.26 11796 0.80 0.05 -0.51 0.31 12168 0.73 0.08 -0.58 0.36 12453 0.66 0.09 -0.63 0.39 12649 0.61 0.11 -0.67 0.42 12776 0.56 0.12 -0.70 0.43 12855 0.52 0.13 -0.72 0.45 12902 0.49 0.13 -0.73 0.46 12929 0.47 0.14 -0.74 0.46 12945 0.45 0.14 -0.75 0.47 12954 0.44 0.14 -0.75 0.47 12960 0.43 0.14 -0.76 0.47 12963 0.42 0.14 -0.76 0.47 12965 0.41 0.14 -0.76 0.48 12966 0.00 1.00 0.00 0.00 795 -0.23 0.33 -0.78 0.49 11645 -0.12 0.21 -0.82 0.52 12191 -0.01 0.19 -0.83 0.52 12469 0.09 0.19 -0.83 0.52 12660 0.16 0.18 -0.82 0.52 12783 0.22 0.17 -0.81 0.51 12859 0.26 0.17 -0.81 0.50 12904 0.29 0.16 -0.80 0.50 12931 0.32 0.16 -0.79 0.50 12946 0.33 0.16 -0.79 0.49 12955 0.35 0.15 -0.78 0.49 12960 0.36 0.15 -0.78 0.49 12963 0.37 0.15 -0.78 0.49 12965 0.37 0.15 -0.78 0.48 12966 0.00 0.00 1.00 0.00 9950 -0.10 -0.18 0.93 -0.31 12279 -0.17 -0.18 0.86 -0.44 12749 -0.23 -0.17 0.83 -0.48 12865 -0.27 -0.17 0.81 -0.49 12911 -0.30 -0.16 0.80 -0.50 12935 -0.32 -0.16 0.79 -0.49 12949 -0.34 -0.16 0.79 -0.49 12956 -0.35 -0.15 0.78 -0.49 12961 -0.36 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.49 12965 -0.37 -0.15 0.78 -0.48 12966 0.00 0.00 0.00 1.00 6686 0.08 0.16 -0.44 0.88 10572 0.16 0.17 -0.69 0.68 12435 0.22 0.17 -0.77 0.57 12816 0.26 0.17 -0.79 0.53 12901 0.29 0.16 -0.79 0.51 12932 0.32 0.16 -0.79 0.50 12947 0.34 0.16 -0.79 0.49 12955 0.35 0.15 -0.78 0.49 12960 0.36 0.15 -0.78 0.49 12963 0.37 0.15 -0.78 0.49 12965 0.37 0.15 -0.78 0.48 12966 0.71 0.71 0.00 0.00 4968 0.94 0.02 -0.29 0.18 11266 0.88 0.02 -0.40 0.24 11709 0.82 0.05 -0.49 0.30 12096 0.74 0.07 -0.57 0.35 12400 0.68 0.09 -0.62 0.38 12614 0.62 0.10 -0.66 0.41 12754 0.57 0.12 -0.69 0.43 12841 0.53 0.12 -0.71 0.44 12894 0.50 0.13 -0.73 0.45 12924 0.47 0.13 -0.74 0.46 12942 0.45 0.14 -0.75 0.47 12953 0.44 0.14 -0.75 0.47 12959 0.43 0.14 -0.76 0.47 12962 0.42 0.14 -0.76 0.47 12964 0.41 0.14 -0.76 0.48 12965 0.41 0.15 -0.77 0.48 12966 0.40 0.15 -0.77 0.48 12967 UCUC(1110) 0.58 0.58 0.58 0.00 4647 0.76 -0.15 0.62 -0.14 9784 0.72 -0.19 0.61 -0.27 10149 0.65 -0.20 0.64 -0.36 10422 0.56 -0.20 0.69 -0.41 10750 0.44 -0.21 0.74 -0.46 11149 0.32 -0.21 0.78 -0.49 11582 0.19 -0.21 0.81 -0.51 11988 0.07 -0.20 0.83 -0.52 12319 -0.04 -0.19 0.83 -0.52 12559 -0.12 -0.18 0.83 -0.52 12719 -0.18 -0.18 0.82 -0.51 12820 -0.23 -0.17 0.81 -0.51 12881 -0.27 -0.17 0.80 -0.50 12917 -0.30 -0.16 0.80 -0.50 12938 -0.32 -0.16 0.79 -0.49 12950 -0.34 -0.16 0.79 -0.49 12957 -0.35 -0.15 0.78 -0.49 12961 -0.36 -0.15 0.78 -0.49 12964 -0.37 -0.15 0.78 -0.49 12965 -0.38 -0.15 0.78 -0.48 12966 UCUC(1010) 0.71 0.00 0.71 0.00 9007 0.69 -0.18 0.67 -0.21 10074 0.62 -0.20 0.68 -0.33 10486 0.52 -0.21 0.72 -0.41 10867 0.40 -0.21 0.76 -0.46 11289 0.27 -0.21 0.80 -0.50 11721 0.15 -0.20 0.82 -0.51 12106 0.03 -0.20 0.83 -0.52 12408 On these pages we display the variance hill-climb for each of the four datasets (Concrete, IRIS, Seeds, Wine) for a grid of starting unit vectors, d. I took the circumscribing unit non-negative cube and used all the Unitized diagonals. In low dimension (all dimension=4 here) this grid is very nearly a uniform grid. Note that this will work less and less well as the dimension grows. In all cases, the same local max and nearly the same unit vector are reached.
GV using a grid (Unitized Corners of Unit Cube + Diagonal of the Variance Matrix + Mean-to-Vector_of_Medians) 2 SEEDS d1 d2 d3 d4 VAR UCUC(1000) UCUC(0100) UCUC(0010) UCUC(0001) UCUC(1100) UCUC(1010) UCUC(1001) UCUC(0110) UCUC(0101) UCUC(0011) UCUC(1110) UCUC(1101) UCUC(1011) UCUC(0111) UCUC(1111) akk MVM WINE d1 d2 d3 d4 VAR UCUC(1000) UCUC(0100) UCUC(0010) UCUC(0001) UCUC(1100) UCUC(1010) UCUC(1001) UCUC(0110) UCUC(0101) UCUC(0011) UCUC(1110) UCUC(1101) UCUC(1011) UCUC(0111) UCUC(1111) akk MVM IRIS d1 d2 d3 d4 VAR UCUC(1000) UCUC(0100) UCUC(0010) UCUC(0001) UCUC(1100) UCUC(1010) UCUC(1001) UCUC(0110) UCUC(0101) UCUC(0011) UCUC(1110) UCUC(1101) UCUC(1011) UCUC(0111) UCUC(1111) akk MVM 1.00 0.00 0.00 0.00 8 0.97 0.16 -0.11 0.14 9 0.00 1.00 0.00 0.00 0 0.96 0.23 -0.14 0.13 9 0.00 0.00 1.00 0.00 2 -0.36 -0.07 0.93 -0.00 4 -0.82 -0.15 0.55 -0.09 8 -0.94 -0.16 0.27 -0.12 9 0.00 0.00 0.00 1.00 0 0.97 0.15 -0.00 0.19 9 0.71 0.71 0.00 0.00 6 0.97 0.17 -0.12 0.13 9 0.71 0.00 0.71 0.00 4 0.96 0.16 0.20 0.15 8 0.97 0.16 -0.05 0.14 9 0.71 0.00 0.00 0.71 5 0.97 0.16 -0.10 0.14 9 0.00 0.71 0.71 0.00 1 0.19 0.06 0.98 0.08 2 0.33 0.04 0.94 0.10 3 0.70 0.11 0.69 0.14 5 0.96 0.16 0.18 0.15 8 0.97 0.16 -0.06 0.14 9 0.00 0.71 0.00 0.71 0 0.97 0.20 -0.08 0.15 9 0.00 0.00 0.71 0.71 1 0.08 -0.01 0.99 0.09 2 -0.07 -0.03 1.00 0.05 3 -0.51 -0.10 0.86 -0.03 5 -0.88 -0.15 0.44 -0.10 8 -0.95 -0.16 0.23 -0.13 9 0.58 0.58 0.58 0.00 4 0.96 0.17 0.15 0.15 8 0.97 0.16 -0.07 0.14 9 0.58 0.58 0.00 0.58 5 0.97 0.17 -0.10 0.14 9 0.58 0.00 0.58 0.58 4 0.96 0.16 0.17 0.16 8 0.97 0.16 -0.06 0.14 9 0.00 0.58 0.58 0.58 1 0.56 0.11 0.80 0.14 4 0.92 0.15 0.31 0.15 8 0.98 0.16 -0.02 0.14 9 0.50 0.50 0.50 0.50 4 0.97 0.17 0.13 0.15 8 0.97 0.16 -0.07 0.14 9 0.98 0.14 0.06 0.13 9 -0.62 0.36 0.27 -0.30 4 -0.95 -0.15 0.22 -0.13 9 1.00 0.00 0.00 0.00 4 0.40 -0.06 -0.91 -0.07 497 0.02 -0.25 -0.97 -0.01 608 0.00 1.00 0.00 0.00 82 -0.00 0.49 0.87 0.00 577 -0.01 0.28 0.96 0.00 608 0.00 0.00 1.00 0.00 567 -0.01 0.25 0.97 0.00 608 0.00 0.00 0.00 1.00 1 -0.20 0.17 0.84 0.47 455 -0.02 0.26 0.96 0.01 608 0.71 0.71 0.00 0.00 42 0.02 0.51 0.86 -0.00 570 -0.01 0.29 0.96 0.00 608 0.71 0.00 0.71 0.00 277 -0.01 0.25 0.97 0.00 608 0.71 0.00 0.00 0.71 2 0.46 0.00 -0.88 0.12 447 0.02 -0.25 -0.97 -0.00 608 0.00 0.71 0.71 0.00 472 -0.01 0.31 0.95 0.00 608 0.00 0.71 0.00 0.71 42 -0.01 0.48 0.88 0.01 578 -0.01 0.28 0.96 0.00 608 0.00 0.00 0.71 0.71 287 -0.02 0.25 0.97 0.01 608 0.58 0.58 0.58 0.00 310 -0.01 0.31 0.95 0.00 607 -0.01 0.27 0.96 0.00 608 0.58 0.58 0.00 0.58 29 0.02 0.50 0.86 0.01 572 -0.01 0.29 0.96 0.00 608 0.58 0.00 0.58 0.58 186 -0.01 0.25 0.97 0.01 608 0.00 0.58 0.58 0.58 317 -0.01 0.30 0.95 0.01 608 0.50 0.50 0.50 0.50 234 -0.01 0.31 0.95 0.01 607 -0.01 0.27 0.96 0.00 608 0.07 0.15 0.98 0.12 588 -0.01 0.26 0.97 0.00 608 -0.13 -1.00 -3.07 -0.03 6314 0.01 -0.27 -0.96 -0.00 608 1.00 0.00 0.00 0.00 68 0.45 -0.03 0.83 0.34 415 0.36 -0.08 0.86 0.36 420 0.00 1.00 0.00 0.00 19 -0.10 0.48 -0.82 -0.30 334 -0.34 0.10 -0.86 -0.36 420 0.00 0.00 1.00 0.00 311 0.35 -0.09 0.86 0.35 420 0.00 0.00 0.00 1.00 58 0.34 -0.08 0.85 0.39 420 0.71 0.71 0.00 0.00 39 0.53 0.12 0.78 0.33 390 0.37 -0.07 0.86 0.36 420 0.71 0.00 0.71 0.00 316 0.38 -0.07 0.85 0.35 420 0.71 0.00 0.00 0.71 114 0.40 -0.05 0.84 0.36 419 0.36 -0.08 0.86 0.36 420 0.00 0.71 0.71 0.00 133 0.37 -0.04 0.86 0.36 419 0.36 -0.08 0.86 0.36 420 0.00 0.71 0.00 0.71 27 0.41 0.06 0.82 0.40 410 0.37 -0.08 0.86 0.36 420 0.00 0.00 0.71 0.71 312 0.35 -0.09 0.86 0.36 420 0.58 0.58 0.58 0.00 193 0.40 -0.04 0.85 0.35 419 0.36 -0.08 0.86 0.36 420 0.58 0.58 0.00 0.58 72 0.43 0.01 0.83 0.36 414 0.37 -0.08 0.86 0.36 420 0.58 0.00 0.58 0.58 349 0.37 -0.07 0.85 0.36 420 0.00 0.58 0.58 0.58 185 0.36 -0.05 0.85 0.37 420 0.50 0.50 0.50 0.50 243 0.90 0.24 0.37 0.04 180 0.41 -0.04 0.84 0.35 418 0.36 -0.08 0.86 0.36 420 0.90 0.24 0.37 0.04 180 0.41 -0.04 0.84 0.35 418 0.36 -0.08 0.86 0.36 420 -0.00 -0.04 0.05 0.01 1 0.35 -0.09 0.86 0.36 420 As we all know, Dr. Ubhaya is the best Mathematician on campus and he is attempting to prove three things: 1. That a GV-hill-climb that does not reach the global max Variance is rare indeed. 2. That one is guaranteed to reach the global maximum with at least one of the coordinate unit vectors (so a 90 degree grid will always suffice). 3. That akk will always reach the global max.
Finding round clusters that aren't DPPd separable? (no linear gap) d Find the golf ball? Suppose we have a white mask pTree. No linear gaps exits to reveal it. Search a grid of d-tubes until a DPPd gap is found in the interior of the tube (Form mask pTree for interior of the d-tube. Apply DPPd that mask to reveal interior gaps.) Look for conical gaps (fix the the cone point at the middle of tube) over all cone angles (look for an interval of angles with no points). Notice that this method includes DPPd since a gap for a cone angle of 90 degrees is linear.
FAUST Gap Revealer Width 24 so compute all pTree combinations down to p4 and p'4 d=M-p 0 &p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 C=3 0 &p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 C=1 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 C=1 &p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 C=0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 C=3 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 C=2 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 C=2 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 C=5 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 C=5 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 C=5 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 C=5 &p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 C=2 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 C=1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 C=2 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 C=6 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 C=2 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 C=2 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 C=8 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 C=2 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 C=8 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 C10 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 C10 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 C10 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 C10 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 z1 z2 z7 2 z3 z5 z8 3 z4 z6 z9 4 za 5 M 6 7 8 zf 9 zb a zc b zd ze c 0 1 2 3 4 5 6 7 8 9 a b c d e f F=zod 11 27 23 34 53 80 118 114 125 114 110 121 109 125 83 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 p2 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 p1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 1 p0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 p2' 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1 p1' 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 p0' 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 Z z1 1 1 z2 3 1 z3 2 2 z4 3 3 z5 6 2 z6 9 3 z7 15 1 z8 14 2 z9 15 3 za 13 4 zb 10 9 zc 11 10 zd 9 11 ze 11 11 zf 7 8 p= [011 0000, 011 1111] = [ 48, 64). z5od=53 is 19 from z4od=34 (>24) but 11 from 64. But the next int [64,80) is empty z5 is 27 from its right nbr. z5 is declared an outlier and we put a subcluster cut thru z5 [000 0000, 000 1111]= [0,15]=[0,16) has 1 point, z1. This is a 24 thinning. z1od=11 is only 5 units from the right edge, so z1 is not declared an outlier) Next, we check the min dis from the right edge of the next interval to see if z1's right-side gap is actually 24 (the calculation of the min is a pTree process - no x looping required!) [001 0000, 001 1111] = [16,32). The minimum, z3od=23 is 7 units from the left edge, 16, so z1 has only a 5+7=12 unit gap on its right (not a 24 gap). So z1 is not declared a 24 (and is declared a 24 inlier). [010 0000 , 010 1111] = [32,48). z4od=34 is within 2 of 32, so z4 is not declared an anomaly. [111 0000 , 111 1111]= [112,128) z7od=118 z8od=114 z9od=125 zaod=114 zcod=121 zeod=125 No 24 gaps. But we can consult SpS(d2(x,y) for actual distances: [110 0000 , 110 1111]= [96,112). zbod=110, zdod=109. So both {z6,zf} declared outliers (gap16 both sides. [100 0000 , 100 1111]= [64, 80). This is clearly a 24 gap. [101 0000 , 101 1111]= [80, 96). z6od=80, zfod=83 Which reveals that there are no 24 gaps in this subcluster. And, incidentally, it reveals a 5.8 gap between {7,8,9,a} and {b,c,d,e} but that analysis is messy and the gap would be revealed by the next xofM round on this sub-cluster anyway. X1 X2 dX1X2 z7 z8 1.4 z7 z9 2.0 z7 z10 3.6 z7 z11 9.4 z7 z12 9.8 z7 z13 11.7 z7 z14 10.8 z8 z9 1.4 z8 z10 2.2 z8 z11 8.1 z8 z12 8.5 z8 z13 10.3 z8 z14 9.5 X1 X2 dX1X2 z9 z10 2.2 z9 z11 7.8 z9 z12 8.1 z9 z13 10.0 z9 z14 8.9 z10 z11 5.8 z10 z12 6.3 z10 z13 8.1 z10 z14 7.3 X1 X2 dX1X2 z11 z12 1.4 z11 z13 2.2 z11 z14 2.2 z12 z13 2.2 z12 z14 1.0 z13 z14 2.0
FAUST Tube Clustering:(This method attempts to build tubular-shaped gaps around clusters) y (yof) (yof) (yof) f |f| f |f| f f o y - f y - = y - squared is y- yo fof fof fof f |f| f |f| yo dot prod proj len (yof)2 (yof)2 (yof)2 (yof)2 f Gaps in dot product lengths [projections] on the line. + + fof squared = yoy - 2 squared = yoy - 2 fof (fof)2 fof fof y ( (y-p)o(q-p) )2 Squared y-p on q-p Projection Distance = (y-p)o(y-p) - (q-p)o(q-p) 1st 2 (yo(q-p)-p o(q-p = yoy -2yop+ pop- |q-p| |M-p| |q-p| |M-p| M-p |M-p| (y-p)o (yof)2 Squared y on f Proj Dis = yoy - For the dot product length projections (caps) we already needed: fof tube cap gap width po M-p ) = ( yo(M-p)- tube radius gap width q Allows for a better fit around convex clusters that are elongated in one direction (not round). Exhaustive Search for all tubular gaps: It takes two parameters for a pseudo- exhaustive search (exhaustive modulo a grid width). 1. A StartPoint, p (an n-vector, so n dimensional) 2. A UnitVector, d (a n-direction, so n-1 dimensional - grid on the surface of sphere in Rn). Then for every choice of (p,d) (e.g., in a grid of points in R2n-1) two functionals are used to enclose subclusters in tubular gaps. a. SquareTubeRadius functional, STR(y) = (y-p)o(y-p) - ((y-p)od)2 b. TubeLength functional, TL(y) = (y-p)od Given a p, do we need a full grid of ds (directions)? No! d and -d give the same TL-gaps. Given d, do we need a full grid of p starting pts? No! All p' s.t. p'=p+cd give same gaps. Hill climb gap width from a good starting point and direction. MATH: Need dot product projection length and dot product projection distance (in red). p dot product projection distance That is, we needed to compute the greenconstants and the blue and red dot product functionals in an optimal way (and then do the PTreeSet additions/subtractions/multiplications). What is optimal? (minimizing PTreeSet functional creations and PTreeSet operations.)
Cone Clustering:(finding cone-shaped clusters) x=s2 cone=.1 39 2 40 1 41 1 44 1 45 1 46 1 47 1 52 1 i39 59 2 60 4 61 3 62 6 63 10 64 10 65 5 66 4 67 4 69 1 70 1 59 w maxs-to-mins cone=.939 14 1 i25 16 1 i40 18 2 i16 i42 19 2 i17 i38 20 2 i11 i48 22 2 23 1 24 4 i34 i50 25 3 i24 i28 26 3 i27 27 5 28 3 29 2 30 2 31 3 32 4 34 3 35 4 36 2 37 2 38 2 39 3 40 1 41 2 46 1 47 2 48 1 49 1 i39 53 1 54 2 55 1 56 1 57 8 58 5 59 4 60 7 61 4 62 5 63 5 64 1 65 3 66 1 67 1 68 1 114 14 i and 100 s/e. So picks i as 0 w naaa-xaaa cone=.95 12 1 13 2 14 1 15 2 16 1 17 1 18 4 19 3 20 2 21 3 22 5 23 6 i21 24 5 25 1 27 1 28 1 29 2 30 2 i7 41/43 e so picks e Cosine cone gap (over some angle) Gap in dot product projections onto the cornerpoints line. Corner points x=s1 cone=1/√2 60 3 61 4 62 3 63 10 64 15 65 9 66 3 67 1 69 2 50 x=s2 cone=1/√2 47 1 59 2 60 4 61 3 62 6 63 10 64 10 65 5 66 4 67 4 69 1 70 1 51 x=s2 cone=.9 59 2 60 3 61 3 62 5 63 9 64 10 65 5 66 4 67 4 69 1 70 1 47 w maxs cone=.707 0 2 8 1 10 3 12 2 13 1 14 3 15 1 16 3 17 5 18 3 19 5 20 6 21 2 22 4 23 3 24 3 25 9 26 3 27 3 28 3 29 5 30 3 31 4 32 3 33 2 34 2 35 2 36 4 37 1 38 1 40 1 41 4 42 5 43 5 44 7 45 3 46 1 47 6 48 6 49 2 51 1 52 2 53 1 55 1 137 w maxs cone=.93 8 1 i10 13 1 14 3 16 2 17 2 18 1 19 3 20 4 21 1 24 1 25 4 26 1 e21 e34 27 2 29 2 37 1 i7 27/29 are i's F=(y-M)o(x-M)/|x-M|-mn restricted to a cosine cone on IRIS w aaan-aaax cone=.54 7 3 i27 i28 8 1 9 3 10 12 i20 i34 11 7 12 13 13 5 14 3 15 7 19 1 20 1 21 7 22 7 23 28 24 6 100/104 s or e so 0 picks i x=i1 cone=.707 34 1 35 1 36 2 37 2 38 3 39 5 40 4 42 6 43 2 44 7 45 5 47 2 48 3 49 3 50 3 51 4 52 3 53 2 54 2 55 4 56 2 57 1 58 1 59 1 60 1 61 1 62 1 63 1 64 1 66 1 75 x=e1 cone=.707 33 1 36 2 37 2 38 3 39 1 40 5 41 4 42 2 43 1 44 1 45 6 46 4 47 5 48 1 49 2 50 5 51 1 52 2 54 2 55 1 57 2 58 1 60 1 62 1 63 1 64 1 65 2 60 Cosine conical gapping seems quick and easy (cosine = dot product divided by both lengths. Length of the fixed vector, x-M, is a one-time calculation. Length y-M changes with y so build the PTreeSet. w maxs cone=.925 8 1 i10 13 1 14 3 16 3 17 2 18 2 19 3 20 4 21 1 24 1 25 5 26 1 e21 e34 27 2 28 1 29 2 31 1 e35 37 1 i7 31/34 are i's w xnnn-nxxx cone=.95 8 2 i22 i50 10 2 11 2 i28 12 4 i24 i27 i34 13 2 14 4 15 3 16 8 17 4 18 7 19 3 20 5 21 1 22 1 23 1 34 1 i39 43/50 e so picks out e
"Gap Hill Climbing": mathematical analysis rotation d toward a higher F-STD or grow 1 gap using support pairs: 0 1 2 3 4 5 6 7 8 9 a b c d e f f 1 0 e2 3 d4 5 6 c7 8 b9 a 9 8 7 6 5 a j k l m n 4 b c q r s 3 d e f o p 2 g h 1 i 0 0 1 2 3 4 5 6 7 8 9 a b c f 1 e2 3 d4 5 6 c7 8 b9 a 9 8 7 6 5 a j k 4 b c q 3 d e f 2 1 0 =p d2-gap d2-gap p C123 p avg=14 q avg=17 0 1 2 3 3 2 4 4 5 7 6 4 7 8 8 2 9 11 10 4 12 3 13 1 20 1 21 1 22 2 23 1 27 2 28 1 29 1 30 2 31 4 d1-gap d1-gap 32 2 33 3 34 4 35 1 36 3 37 4 38 2 39 2 40 5 41 3 42 3 43 6 44 8 45 1 46 2 47 1 48 3 49 3 51 7 52 2 53 2 54 3 55 1 56 3 57 3 58 1 61 2 63 2 64 1 66 1 67 1 q= q d2 d1 d1 d2 F-slices are hyperplanes (assuming F=dotd) so it would makes sense to try to "re-orient" d so that the gap grows.Instead of taking the "improved" p and q to be the means of the entire n-dimensional half-spaces which is cut by the gap (or thinning), take as p and q to be the means of the F-slice (n-1)-dimensional hyperplanes defining the gap or thinning.This is easy since our method produces the pTree mask the sequence of F-values and the sequence of counts of points that give us those value that we use to find large gaps in the first place. Dot F p=aaan q=aaax 0 6 1 28 2 7 3 7 4 1 5 1 9 7 10 3 11 5 12 13 13 8 14 12 15 4 16 2 17 12 18 5 19 6 20 6 21 3 22 8 23 3 24 3 C1<7 (50 Set) d2-gap >> than d1=gap (still not optimal.) Weight mean by the dist from gap? (d-barrel radius) 7<C2<16 (4i, 48e) In this example it seems to make for a larger gap, but what weightings should be used? (e.g., 1/radius2) (zero weighting after the first gap is identical to the previous). Also we really want to identify the Support vector pair of the gap (the pair, one from one side and the other from the other side which are closest together) as p and q (in this case, 9 and a but we were just lucky to draw our vector through them.) We could check the d-barrel radius of just these gap slice pairs and select the closest pair as p and q??? C3>16 (46i, 2e) hill-climb gap at 16 w half-space avgs. C2uC3 p=avg<16 q=avg>16 0 1 1 1 2 2 3 1 7 2 9 2 10 2 11 3 12 3 13 2 14 5 15 1 16 3 17 3 18 2 19 2 20 4 21 5 22 2 23 5 24 9 25 1 26 1 27 3 28 2 29 1 30 3 31 5 32 2 33 3 34 3 35 1 36 2 37 4 38 1 39 1 42 2 44 1 45 2 47 2 No conclusive gaps Sparse Lo end: Check [0,9] 0 1 2 2 3 7 7 9 9 i39 e49 e8 e44 e11 e32 e30 e15 e31 i39 0 17 21 21 24 22 19 19 23 e49 17 0 4 4 7 8 8 9 9 e8 21 4 0 1 5 7 8 10 8 e44 21 4 1 0 4 6 8 9 7 e11 24 7 5 4 0 7 9 11 7 e32 22 8 7 6 7 0 3 6 1 e30 19 8 8 8 9 3 0 4 4 e15 19 9 10 9 11 6 4 0 6 e31 23 9 8 7 7 1 4 6 0 i39,e49,e11 singleton outliers. {e8,i44} doubleton outlier set There is a thinning at 22 and it is the same one but it is not more prominent. Next we attempt to hill-climb the gap at 16 using the mean of the half-space boundary. (i.e., p is avg=14; q is avg=17. Sparse Hi end: Check [38,47] distances 38 39 42 42 44 45 45 47 47 i31 i8 i36 i10 i6 i23 i32 i18 i19 i31 0 3 5 10 6 7 12 12 10 i8 3 0 7 10 5 6 11 11 9 i36 5 7 0 8 5 7 9 10 9 i10 10 10 8 0 10 12 9 9 14 i6 6 5 5 10 0 3 9 8 5 i23 7 6 7 12 3 0 11 10 4 i32 12 11 9 9 9 11 0 4 13 i18 12 11 10 9 8 10 4 0 12 i19 10 9 9 14 5 4 13 12 0 i10,i18,i19,i32,i36 singleton outliers {i6,i23} doubleton outlier Here, gap between C1,C2 is more pronounced Why? Thinning C2,C3 more obscure? It did not grow gap wanted to grow (tween C2 ,C3.
CAINE 2013 Call for Papers 26th International Conference on Computer Applications in Industry and Engineering September 25{27, 2013, Omni Hotel, Los Angles, Califorria, USA Sponsored by the International Society for Computers and Their Applications (ISCA) CAINE{2013 will feature contributed papers as well as workshops and special sessions. Papers will be accepted into oral presentation sessions. The topics will include, but are not limited to, the following areas: Agent-Based Systems Image/Signal Processing Autonomous Systems Information Assurance Big Data Analytics Information Systems/Databases Bioinformatics, Biomedical Systems/Engineering Internet and Web-Based Systems Computer-Aided Design/Manufacturing Knowledge-based Systems Computer Architecture/VLSI Mobile Computing Computer Graphics and Animation Multimedia Applications Computer Modeling/Simulation Neural Networks Computer Security Pattern Recognition/Computer Vision Computers in Education Rough Set and Fuzzy Logic Computers in Healthcare Robotics Computer Networks Fuzzy Logic Control Systems Sensor Networks Data Communication Scientic Computing Data Mining Software Engineering/CASE Distributed Systems Visualization Embedded Systems Wireless Networks and Communication Important Dates: Workshop/special session proposal . . May 2.5,.2.013 Full Paper Submis . .June 5,.2013. Notice Accept ..July.5 , 2013. Pre-registration & Camera-Ready Paper Due . . . ..August 5, 2013. Event Dates . . .Sept 25-27, 2013 SEDE Conf is interested in gathering researchers and professionals in the domains of SE and DE to present and discuss high-quality research results and outcomes in their fields. SEDE 2013 aims at facilitating cross-fertilization of ideas in Software and Data Engineering, The conference topics include, but not limited to: . Requirements Engineering for Data Intensive Software Systems. Software Verification and Model of Checking. Model-Based Methodologies. Software Quality and Software Metrics. Architecture and Design of Data Intensive Software Systems. Software Testing. Service- and Aspect-Oriented Techniques. Adaptive Software Systems . Information System Development. Software and Data Visualization. Development Tools for Data Intensive. Software Systems. Software Processes. Software Project Mgnt . Applications and Case Studies. Engineering Distributed, Parallel, and Peer-to-Peer Databases. Cloud infrastructure, Mobile, Distributed, and Peer-to-Peer Data Management . Semi-Structured Data and XML Databases. Data Integration, Interoperability, and Metadata. Data Mining: Traditional, Large-Scale, and Parallel. Ubiquitous Data Management and Mobile Databases. Data Privacy and Security. Scientific and Biological Databases and Bioinformatics. Social networks, web, and personal information management. Data Grids, Data Warehousing, OLAP. Temporal, Spatial, Sensor, and Multimedia Databases. Taxonomy and Categorization. Pattern Recognition, Clustering, and Classification. Knowledge Management and Ontologies. Query Processing and Optimization. Database Applications and Experiences. Web Data Mgnt and Deep Web May 23, 2013 Paper Submission Deadline June 30, 2013 Notification of Acceptance July 20, 2013 Registration and Camera-Ready Manuscript Conference Website: http://theory.utdallas.edu/SEDE2013/ ACC-2013 provides an international forum for presentation and discussion of research on a variety of aspects of advanced computing and its applications, and communication and networking systems. Important Dates May 5, 2013 - Special Sessions Proposal June 5, 2013 - Full Paper Submission July 5, 2013 - Author Notification Aug. 5, 2013 - Advance Registration & Camera Ready Paper Due CBR International Workshop Case-Based Reasoning CBR-MD 2013 July 19, 2013, New York/USA Topics of interest include (but are not limited to): CBR for signals, images, video, audio and text Similarity assessment Case representation and case mining Retrieval and indexing Conversational CBR Meta-learning for model improvement and parameter setting for processing with CBR Incremental model improvement by CBR Case base maintenance for systems Case authoring Life-time of a CBR system Measuring coverage of case bases Ontology learning with CBR Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop on Data Mining in Life Sciences DMLS Discovery of high-level structures, incl e.g. association networks Text mining from biomedical literatur Medical images mining Biomedical signals mining Temporal and sequential data mining Mining heterogeneous data Mining data from molecular biology, genomics, proteomics, pylogenetic classification With regard to different methodologies and case studies: Data mining project development methodology for biomedicine Integration of data mining in the clinic Ontology-driver data mining in life sciences Methodology for mining complex data, e.g. a combination of laboratory test results, images, signals, genomic and proteomic samples Data mining for personal disease management Utility considerations in DMLS, including e.g. cost-sensitive learning Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013 Workshop on Data Mining in Marketing DMM'2013In business environment data warehousing - the practice of creating huge, central stores of customer data that can be used throughout the enterprise - is becoming more and more common practice and, as a consequence, the importance of data mining is growing stronger. Applications in Marketing Methods for User Profiling Mining Insurance Data E-Markteing with Data Mining Logfile Analysis Churn Management Association Rules for Marketing Applications Online Targeting and Controlling Behavioral Targeting Juridical Conditions of E-Marketing, Online Targeting and so one Controll of Online-Marketing Activities New Trends in Online Marketing Aspects of E-Mailing Activities and Newsletter Mailing Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013 Workshop Data Mining in Ag DMA 2013Data Mining on Sensor and Spatial Data from Agricultural Applications Analysis of Remote Sensor Data Feature Selection on Agricultural Data Evaluation of Data Mining Experiments Spatial Autocorrelation in Agricultural Data Submission Deadline: March 20th, 2013 Notification Date: April 30th, 2013 Camera-Ready Deadline: May 12th, 2013 Workshop date: July 19th, 2013
DEFG ABC But horizontal anti-chains are clusterngs from top down (or bottom up) method(s). Hierarchical Clustering Any maximal anti-chain (maximal set of nodes s.t no 2 directly connected) is a clustering. (dendogram offers many DE FG A BC G F D E C B
GV F=(DPP-MN)/4 Concrete(C, W, FA, A) med=71 med=40 med=18 med=61 med=14 med=56 med=10 med=62 med=86 med=57 med=34 med=9 med=21 med=23 med=71 med=33 med=17 C1 C2 C3 C4 0 1 1 1 5 1 6 1 7 1 8 4 9 1 10 1 11 2 12 1 13 5 14 1 15 3 16 3 17 4 18 1 19 3 20 9 21 4 22 3 23 7 24 2 25 4 26 8 27 7 28 7 29 10 30 3 31 1 32 3 33 6 34 4 35 5 37 2 38 2 40 1 42 3 43 1 44 1 45 1 46 4 49 1 56 1 58 1 61 1 65 1 66 1 69 1 71 1 77 1 80 1 83 1 86 1 100 1 103 1 105 1 108 2 112 1 CLUS 4 (F=(DPP-MN)/2, Fgap2 0 3 7 4 9 1 10 12 11 8 12 7 15 4 18 10 21 3 22 7 23 2 25 2 26 3 27 1 28 2 29 1 31 3 32 1 34 2 40 4 47 3 52 1 53 3 54 3 55 4 56 2 57 3 58 1 60 2 61 2 62 4 64 4 67 2 68 1 71 7 72 3 79 5 85 1 87 2 _______ =0 0L 0M 3H CLUS 4.4.1 gap=7 Median=0 Avg=0 =7 0L 0M 4H CLUS 4.4.2 gap=2Median=7 Avg=7 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err HMedian=11 Avg=10.7 gap=3 ______ =15 0L 0M 4H CLUS 4.3.1 gap=3 Median=15 Avg=15 =18 0L 0M 10H CLUS 4.3.2 gap=3Median=18 Avg=18 ______ [20,24) 0L 10M 2H CLUS 4.7.2 gap=2Median=22 Avg=22 2H errs in L [24,30) 10L 0M 0H CLUS_4.7.1 Median=26 Avg=26 gap=2 [30,33] 0L 4M 0H CLUS 4.2.1 gap=2Median=31 Avg=32.3 =34 0L 2M 0H CLUS 4.2.2 gap=6Median=34 Avg=34 ______ =40 0L 4M 0H CLUS_4.2.3 gap=7 Median=40 Avg=40 =47 0L 3M 0H CLUS_4.2.4 gap=5Median=47 Avt=47 Accuracy=90% ______ [50,59) 12L 1M 4H CLUS 4.8.1 gap=2Median=55 Avg=55 1M+4H errs in L [59,63) 8L 0M 0H CLUS_4.8.2 Median=61.5 Avg=61.3 gap=2 ______ =64 2L 0M 2H CLUS 4.6.1 gap=3Median=64 Avg=64 2 H errs in L [66,70) 10L 0M 0H CLUS 4.6.2 Median=67 Avg=67.3 gap=3 [70,79) 10L 0M 0H CLUS_4.5 Median=71 Avg=71.7 ______ gap=7 =79 5L 0M 0H CLUS_4.1.1 gap=6 Median=79 Avg=79 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=87 Avg=86.3 Suppose we know (or want) 3 clusters, Low, Medium and High Strength. Then we find ______ CLUS 4 gap=7 [52,74) 0L 7M 0H CLUS_3 Suppose we know that we want 3 strength clusters, Low, Medium and High. We can use an anti-chain that gives us exactly 3 subclusters two ways, one show in brown and the other in purple Which would we choose? The brown seems to give slightly more uniform subcluster sizes. Brown error count: Low (bottom) 11, Medium (middle) 0, High (top) 26, so 96/133=72% accurate. The Purple error count: Low 2, Medium 22, High 35, so 74/133=56% accurate. ______ gap=6 [74,90) 0L 4M 0H CLUS_2 What about agglomerating using single link agglomeration (minimum pairwise distance? ________ [0.90) 43L 46 M 55H gap=14 [90,113) 0L 6M 0H CLUS_1 Agglomerate (build dendogram) by iteratively gluing together clusters with min Median separation. Should I have normalize the rounds? Should I have used the same Fdivisor and made sure the range of values was the same in 2nd round as it was in the 1st round (on CLUS 4)? Can I normalize after the fact, I by multiplying 1st round values by 100/88=1.76? Agglomerate the 1st round clusters and then independently agglomerate 2nd round clusters? _____________At this level, FinalClus1={17M} 0 errors CONCRETE
Agglomerating using single link (min pairwise distance = min gap size! (glue min-gap adjacent clusters 1st) GV CLUS 4 (F=(DPP-MN)/2, Fgap2 0 3 7 4 9 1 10 12 11 8 12 7 15 4 18 10 21 3 22 7 23 2 25 2 26 3 27 1 28 2 29 1 31 3 32 1 34 2 40 4 47 3 52 1 53 3 54 3 55 4 56 2 57 3 58 1 60 2 61 2 62 4 64 4 67 2 68 1 71 7 72 3 79 5 85 1 87 2 _______ =0 0L 0M 3H CLUS 4.4.1 gap=7 Median=0 Avg=0 =7 0L 0M 4H CLUS 4.4.2 gap=2Median=7 Avg=7 [8,14] 1L 5M 22H CLUS 4.4.3 1L+5M err HMedian=11 Avg=10.7 gap=3 ______ =15 0L 0M 4H CLUS 4.3.1 gap=3 Median=15 Avg=15 =18 0L 0M 10H CLUS 4.3.2 gap=3Median=18 Avg=18 ______ [20,24) 0L 10M 2H CLUS 4.7.2 gap=2Median=22 Avg=22 2H errs in L [24,30) 10L 0M 0H CLUS_4.7.1 Median=26 Avg=26 gap=2 [30,33] 0L 4M 0H CLUS 4.2.1 gap=2Median=31 Avg=32.3 =34 0L 2M 0H CLUS 4.2.2 gap=6Median=34 Avg=34 ______ =40 0L 4M 0H CLUS_4.2.3 gap=7 Median=40 Avg=40 =47 0L 3M 0H CLUS_4.2.4 gap=5Median=47 Avt=47 Accuracy=90% ______ [50,59) 12L 1M 4H CLUS 4.8.1 gap=2Median=55 Avg=55 1M+4H errs in L [59,63) 8L 0M 0H CLUS_4.8.2 Median=61.5 Avg=61.3 gap=2 ______ =64 2L 0M 2H CLUS 4.6.1 gap=3Median=64 Avg=64 2 H errs in L [66,70) 10L 0M 0H CLUS 4.6.2 Median=67 Avg=67.3 gap=3 [70,79) 10L 0M 0H CLUS_4.5 Median=71 Avg=71.7 ______ gap=7 =79 5L 0M 0H CLUS_4.1.1 gap=6 Median=79 Avg=79 [74,90) 2L 0M 1H CLUS_4.1 1 Merr in L Median=87 Avg=86.3 The first thing we can notice is that outliers mess up agglomerations which are supervised by knowledge of the number of subclusters expected. Therefore we might remove outliers by backing away from all gap5 agglomerations, then looking for a 3 subcluster max anti-chains. What we have done is to declare F<7 and F>84 as extreme tripleton outliers sets; and F=79. F=40 and F=47 as singleton outlier sets because they are F-gapped by at least 5 (which is actually 10) on either side. The brown gives more uniform sizes. Brown errors: Low (bottom) 8, Medium (middle) 12 and High (top) 6, so 107/133=80% accurate. The one decision to agglomerate C4.7.1 to C4.7.2 (gap=3) instead of C4.3.2 to C4.7.2 (gap=3) lots of error. C4.7.1 and C4.7.2 are problematic since they are separate out, but in increasing F order, it's H M L M L, so if we suspected this pattern we would look for 5 subclusters. The 5 orange errors in increasing F-order are: 6, 2, 0, 0, 8 so 127/133=95% accurate. If you have ever studied concrete, you know it is a very complex material. The fact that it clusters out with a F-order pattern of HMLML is just bizarre! So we should expect errors. CONCRETE