1 / 43

FAUST Analytics: Classification, Clustering, and Outlier Detection

This tool allows for the analysis of classified training sets through various functionals, including count change clustering, density thresholds, and linear and spherical radial classifiers.

randolphc
Download Presentation

FAUST Analytics: Classification, Clustering, and Outlier Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FAUST Analytics X(X1..Xn)Rn, |X|=N. If X is a classified training set with classes=C={C1..CK}, X((X1..Xn,C}. d=(d1..dn), |d|=1. p=(p1..pn)Rn. We have functionals, F:RnR, F=L, S, R (as well as others, but these are the focus here). Ld,p  (X-p)od = Xod - pod = Ld - pod, (where LD=XoD for any vector, D) Sp  (X-p)o(X-p) = XoX + Xo(-2p) + pop = L-2p + XoX+pop Rd,p  Sp - L2d,p = XoX+L-2p+pop-(Ld)2-2pod*Xod+(pod)d2 = L-2p-(2pod)d - (Ld)2+pop+(pod)2+XoX Assuming XoX is pre-calculated, for all 3, calculate Ld, L-2p and do pTree arithmetic (if just L and R, calculate Ld, L-2p-(2pod)d). FPCCd,p,k,j = jth precipitous count change (from left-to-right) of Fd,p,k. Same notation for PCIs and PCDs (incr/decr) Fmind,p,k= min(Fd,p&Ck), Fmaxd,p,k= max(Fd,p&Ck) GAP: GapClustererIf DensityThreshold, DT, isn't reached, cut C mid-gap of Ld,p&C using the next (d,p) from dpSet PCC: Precipitous Count Change ClustererIf DT isn't reached, cut C at PCCsLd,p&C using the next (d,p) from dpSet Fusion step may be required? Use density, proximity, or use Pillar pkMeans (next slide). TKO: Top K OutlierDetectorUse D2NN=rank2Sx for TopKOutlier-slider. or use RkiPtr(x,PtrRankiSx). RkiSD(x,RankiSx) ordered as constructing desc on rankiSx. LIN: Linear Classifier yCk iff yLHk  {z | minLd,p,k  Ld,p,k(z)  maxLd,pd,k}  (d,p)dpSet LHk is a Linear hull around Ck. dpSet is a set of (d,p) pairs, e.g., (Diag,DiagStartPt). LSR: Linear Spherical Radial ClassifieryCk iff yLSRHk{z | minFd,p,k Fd,p,k(z)  maxFd,p,k d,pdpSet, F=L,S,R} (Examine and remove outliers first, then use first PCI instead of min and last PCD instead of max?) Express the Hulls as decision trees, one for every d. Then y isa k iff y isa k in every d-tree. Build each d-tree using Ld at the root and then from any multi-class inode use F=L,R,S with d=AvCiAvCj and p=AvCi distinct pair Ci, Cj, where Ci,Cj have nonempty restrictions at that node, using every F=L,S,R except the parent. This assumes convex classes. If it's known/suspected there are non-convex classes, judicious use of PCCs may provide tighter hulls. What should we pre-compute besides XoX? stats(min/avg/max/std); Xop; p=class_Avg/Med; Xod; Xox; d2(X,x); Rkid2(X,x);Ld,p, Rd,p We need a "Basic pTree Operations Timing Manual" to show users the cost of various pTree computations.

  2. Nest we consider ways of clustering these MOTHER GOOSE stories (used to construct MG44docs60words) 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there the cupboard was bare and so the poor dog had none. She went to the baker to buy him some bread. When she came back the dog was dead. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round the mulberry bush, the mulberry bush, the mulberry bush. Here we go round the mulberry bush, on a cold and frosty morning. This is the way we wash our hands, wash our hands, wash our hands. This is the way we wash our hands, on a cold and frosty morning. This is the way we wash our clothes, wash our clothes, wash our clothes. This is the way we wash our clothes, on a cold and frosty morning. This is the way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I!

  3. Here I have acted as an "expert" and declared classes for MG44d60w with a theme for each. I wanted to see if we an hull these classes. CLASS1: SOOTHING BABIES (count=4) 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. CLASS2: POWER OF THE MONARCH (count=3) 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. CLASS3: EATING HABITS AND VARIATIONS (count=8) 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there the cupboard was bare and so the poor dog had none. She went to the baker to buy him some bread. When she came back the dog was dead 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! CLASS4: CRUEL AND SCARY THINGS (count=4) 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. CLASS5: WISDOMS AND ADVISORIES (count=4) 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! CLASS6: TEACHINGS (TO COUNT/ADD/DANCE/ETC. (count=5) 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 37. Here we go round the mulberry bush, the mulberry bush, the mulberry bush. Here we go round the mulberry bush, on a cold and frosty morning. This is the way we wash our hands, wash our hands, wash our hands. This is the way we wash our hands, on a cold and frosty morning. This is the way we wash our clothes, wash our clothes, wash our clothes. This is the way we wash our clothes, on a cold and frosty morning. This is the way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. CLASS7: COMMERCE (count=3) 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. TEST SET (count=13) 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid.

  4. FAUST L Hull classifier, recursive on MG44d60w d=q-p, q=0, p=avCk recursive Class sizes: 4 3 8 3 4 5 3 d=q-p, q=0, p=avCk Ld,avC1 -.2, .08 1.45, 1.45 0.77, 1.45 1.28, 1.45 0.77, 1.45 1.11, 1.45 1.11, 1.45 I1=[-.2 , .08] 4000000 I2=[.77 ,1.12) 0010111 I3=[1.12,1.28) 0030011 I4=[1.29,1.46] 0343331 Since all points are outlierish, we should be able to carve off one class at a time by connecting averages? (Note, here I'm not even bothering to chop into intervals other than to carve off the Ck interval when it is non=overlapping with the others). Note: avCk' estimated as av(avCh | hk) FAUST L Recursing On Averages: Ld,avC2 1.73 1.73 -1.1 0.76 1.15 1.73 1.34 1.73 1.34 1.73 1.34 1.53 1.53 1.73 C2 0 overlaps Ld,avC3 0.61 0.86 -0.0 0.99 -0.5 0.74 0.48 0.86 0.74 0.99 0.61 0.99 0.74 0.99 C3 overlaps w 1,2,4,5,6,7 Ld,avC5 0.75 0.75 0.5 1 0.25 1 0.75 0.75 -0.2 0.5 0.5 1 0.75 1 C5 overlaps w 2,3,6 Ld,avC6 0.57 0.97 0.16 0.97 0.57 0.97 0.36 0.97 0.57 0.97 -0.6 0.36 0.77 0.77 C6 overlaps w 2,4 Ld,avC7 0.58 0.94 0.58 0.94 0.58 0.94 0.23 0.94 0.58 0.94 0.23 0.94 -0.1 0.23 C7 overlaps w 4,6 Ld,avC1 -0.2 0.08 1.45 1.45 0.77 1.45 1.28 1.45 0.77 1.45 1.11 1.45 1.11 1.45 C1 0 overlpas Ld,avC4 0.99 1.19 0.78 0.99 0.78 1.19 -0.2 0.78 0.78 1.19 0.78 1.19 0.99 1.19 C4 overlaps w 2,3,5,6 Lp=avC1,q=avC1' -0.2, 0.10 C1 1.52, 2.04 1.14, 1.67 1.42, 1.64 0.85, 1.54 1.35, 1.54 1.13, 1.44 Ld,avC2 on I4 0.9 , 2.9 C2 0 , .57 C3 0 , .38 C4 0 , .38 C5 0.19, .38 C6 0 , .19 C7 I41=[ 0 , .19) 001201 I42=[.19, .38] 042130 I43=(.38, .57] I44=[.9 , 3] 300000 Carve off C1: Lp=avC2,q=avC2' -1.1, 0.70 C2 1.35, 1.82 1.48, 2.00 1.41, 1.88 1.37, 1.66 1.59, 1.80 d=q-p, q=0, p=maxCk C2:L34 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 C2 overlap C3:L47 1 1 1 1 0 1 1 1 1 1 1 1 1 1 5 C3 overlap C5:L8 1 1 1 1 1 1 1 1 0 1 1 1 1 1 2 C5 overlaps C6:L52 1 1 1 1 1 1 1 1 1 1 0 1 1 1 2 C6 overlaps Ld,avC7 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 C7 overlaps C1:L3 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 C1 overlap C4:L49 1 1 1 1 1 1 0 1 1 1 1 1 1 1 2 C4 overlap Ld,avC4 on I42 0.78,1.19 C3 -0.2,0.78 C4 0.78,1.19 C5 0.78,1.19 C6 I421=[-.2, .78) 0200 I422=[.78,1.19] 4013 Carve off C2: Lp=avC3,q=avC3' -0.4, 0.60 C3 0.71, 1.25 0.91, 1.16 0.63, 1.28 0.84, 1.15 Ld,avC3 on I422 -0.5, 0.74 C3 0.74, 0.99 C5 0.61, 0.99 C6 I4221=[-.5, .61) 3000 I4222=[.61, .74) 1001 I4223=[.72, .99] 0012 Carve off C3: Lp=avC4,q=avC4' -0.1, 0.67 C4 0.88, 1.58 0.98, 1.58 1.13, 1.36 Carve off C4: Lp=avC5,q=avC5' -0.2, 0.34 C5 0.97, 1.59 0.92, 1.28 Consider the following interesting facts about the unit sphere and unit cube in high dimensions: pTrees lie in the nonnegative polytant of the unit sphere, npus,. So, they all lie on the sphere centered at (½,...,½)½ with radius=n/2. So it seems likely that ½ is a better centroid for pTree analysis than the origin. Other interesting observations: npus has hypervolume=1 no matter the dimension so limninfHypVol(npuc)=1. But we know that the volume of the nsphere goes to zero as n goes to infinity (albeit the descent to zero kicks in later the larger the radius). This means that the as n goes to infinity, the npus radius increases just fast enough to keep the hypervolume stable at 1. I put down these remarks before developing FAUST L Recursing On Averages, which seems to be great for classification of text!! It may save some time to use ½ instead of av(avCk') ? , which is a lazy computation anyway. we should keep sumCk and countCk then we can calculate avCk' precisely at almost no additional expense. But the main use of the ½ centroid may be in clustering. Carve off C5: Lp=avC6,q=avC6' -0.4, 0.52 C6 1.01, 1.29 Carve off C6:

  5. FAUST Thanksgiving q=½ clusterer(always use q=1/2, p=longest left in the table) L Gap R Gap 44HLH 0.13 0.51 -0.02 3.6 23MTB 0.65 0 3.58 3 32JGF 0.65 0.12 6.58 -1.1 22HLH 0.77 0.12 5.40 0.78 07OMH 0.90 0 6.18 1 46TTP 0.90 0 7.18 -2 47CCM 0.90 0 5.18 1 29LFW 0.90 0.12 6.18 0.75 26SBS 1.03 0 6.93 -1 35SSS 1.03 0 5.93 1 43HHD 1.03 0 6.93 0 02TLP 1.03 0 6.93 0 45BBB 1.03 0 6.93 0 42BBC 1.03 0 6.93 0 25WOW 1.03 0 6.93 0 05HDS 1.03 0 6.93 0 14ASO 1.03 0 6.93 0 11OMM 1.03 0 6.93 0 33BFP 1.03 0 6.93 0 39LCS 1.03 0.12 6.93 0.71 28BBB 1.16 0 7.65 0 10JAJ 1.16 0 7.65 0 01TBM 1.16 0 7.65 0 37MBB 1.16 0 7.65 0 38YLS 1.16 0 7.65 0 15PCD 1.16 0 7.65 0 04LMM 1.16 0.12 7.65 0.68 21LAU 1.29 0 8.33 -1 13RRS 1.29 0 7.33 1 41OKC 1.29 0 8.33 0 48OTB 1.29 0 8.33 0 03DDD 1.29 0 8.33 0 08JSC 1.29 0.12 8.33 0.65 18HTP 1.42 0 8.98 0 50LJH 1.42 0.12 8.98 0.61 30HDD 1.55 0.12 9.60 0.58 49WLG 1.68 0 10.18 0 36LTT 1.68 0.51 10.18 2 27CBC 2.19 12.18 carve 44HLH 23MTB L Gap R Gap 17FEC 0.00 0.77 0.00 5.4 22HLH 0.77 0 5.40 0 47CCM 0.77 0 5.40 1 23MTB 0.77 0.12 6.40 0.78 06SPP 0.90 0 7.18 0 45BBB 0.90 0 7.18 -1 36LTT 0.90 0.12 6.18 0.75 35SSS 1.03 0 6.93 0 37MBB 1.03 0 6.93 0 18HTP 1.03 0 6.93 0 28BBS 1.03 0 6.93 0 13RRS 1.03 0 6.93 1 48OTB 1.03 0 7.93 -1 38YLS 1.03 0 6.93 0 25WOW 1.03 0 6.93 0 46TTP 1.03 0 6.93 0 11OMM 1.03 0.12 6.93 0.71 30HDD 1.16 0 7.65 0 02TLP 1.16 0 7.65 0 42BBC 1.16 0 7.65 0 26SBS 1.16 0 7.65 0 49WLG 1.16 0 7.65 0 29LFW 1.16 0 7.65 0 03DDD 1.16 0 7.65 0 14ASO 1.16 0 7.65 0 32JGF 1.16 0 7.65 0 10JAJ 1.16 0.12 7.65 0.68 01TBM 1.29 0 8.33 0 27CBC 1.29 0 8.33 0 39LCS 1.29 0 8.33 0 08JSC 1.29 0 8.33 0 07OMH 1.29 0.12 8.33 -0.3 50LJH 1.42 0 7.98 2 43HHD 1.42 0 9.98 -2 33BFP 1.42 0 7.98 1 41OKC 1.42 0 8.98 0 15PCD 1.42 0.12 8.98 0.61 44HLH 1.55 0.12 9.60 0.58 21LAU 1.68 0 10.18 -2 04LMM 1.68 0.51 8.18 4 05HDS 2.19 12.18 carve 17FEC 05HDS outliers L Gap R Gap 39LCS 0.00 0.77 0.00 5.4 01TBM 0.77 0.12 5.40 0.78 38YLS 0.90 0 6.18 0 16PPG 0.90 0 6.18 0 27CBC 0.90 0 6.18 1 35SSS 0.90 0 7.18 -1 29LFW 0.90 0.12 6.18 0.75 17FEC 1.03 0 6.93 0 04LMM 1.03 0 6.93 1 28BBB 1.03 0 7.93 -1 06SPP 1.03 0 6.93 0 25WOW 1.03 0 6.93 1 44HLH 1.03 0.12 7.93 -0.2 50LJH 1.16 0 7.65 0 07OMH 1.16 0 7.65 0 42BBC 1.16 0 7.65 0 05HDS 1.16 0 7.65 0 10JAJ 1.16 0 7.65 0 08JSC 1.16 0 7.65 0 23MTB 1.16 0 7.65 1 37MBB 1.16 0 8.65 -1 22HLH 1.16 0.12 7.65 0.68 48OTB 1.29 0 8.33 0 03DDD 1.29 0 8.33 0 26SBS 1.29 0 8.33 0 21LAU 1.29 0 8.33 0 11OMM 1.29 0 8.33 0 49WLG 1.29 0 8.33 0 41OKC 1.29 0.12 8.33 0.65 47CCM 1.42 0 8.98 0 30HDD 1.42 0 8.98 0 33BFP 1.42 0 8.98 0 13RRS 1.42 0 8.98 1 15PCD 1.42 0 9.98 -1 46TTP 1.42 0.12 8.98 -1.3 36LTT 1.55 0 7.60 1 12OWF 1.55 0 8.60 0 14ASO 1.55 0 8.60 2 43HHD 1.55 0.12 10.60 -0.4 45BBB 1.68 0.12 10.18 0.55 32JGF 1.81 0 10.73 0 02TLP 1.81 0.25 10.73 1 18HTP 2.07 11.73 carve off 39LCS outlier L Gap R Gap 50LJH 0.00 0.90 0.00 7.18 38YLS 0.90 0.12 7.18 -0.2 06SPP 1.03 0 6.93 0 45BBB 1.03 0 6.93 1 14ASO 1.03 0 7.93 -1 46TTP 1.03 0 6.93 0 15PCD 1.03 0 6.93 1 48OTB 1.03 0.12 7.93 -0.2 26SBS 1.16 0 7.65 0 36LTT 1.16 0 7.65 0 29LFW 1.16 0 7.65 0 23MTB 1.16 0 7.65 0 39LCS 1.16 0 7.65 0 32JGF 1.16 0 7.65 0 28BBB 1.16 0 7.65 0 33BFP 1.16 0 7.65 0 11OMM 1.16 0 7.65 0 17FEC 1.16 0 7.65 0 37MBB 1.16 0 7.65 0 18HTP 1.16 0.12 7.65 0.68 35SSS 1.29 0 8.33 0 21LAU 1.29 0 8.33 0 03DDD 1.29 0 8.33 0 44HLH 1.29 0 8.33 0 01TBM 1.29 0 8.33 0 04LMM 1.29 0.12 8.33 0.65 10JAJ 1.42 0 8.98 0 30HDD 1.42 0 8.98 1 05HDS 1.42 0 9.98 -1 13RRS 1.42 0 8.98 0 27CBC 1.42 0 8.98 0 02TLP 1.42 0 8.98 1 41OKC 1.42 0.12 9.98 -1.3 22HLH 1.55 0 8.60 0 47CCM 1.55 0 8.60 1 08JSC 1.55 0 9.60 0 42BBC 1.55 0 9.60 1 43HHD 1.55 0.12 10.60 -0.4 07OMH 1.68 0.12 10.18 -1.4 49WLG 1.81 0 8.73 2 16PPG 1.81 0.77 10.73 2.6 25WOW 2.58 13.33 carve 50LJH as outlier L Gap R Gap 12OWF 0.00 1.67 0.00 10.1 05HDS 1.68 0 10.18 0 28BBB 1.68 0 10.18 0 44HLH 1.68 0.12 10.18 0.55 49WLG 1.81 0 10.73 1 08JSC 1.81 0 11.73 -1 39LCS 1.81 0 10.73 0 37MBB 1.81 0 10.73 0 04LMM 1.81 0 10.73 -1 48OTB 1.81 0 9.73 1 06SPP 1.81 0.12 10.73 0.51 09HBD 1.94 0 11.25 -1 35SSS 1.94 0 10.25 1 47CCM 1.94 0 11.25 0 03DDD 1.94 0 11.25 -1 22HLH 1.94 0 10.25 1 14ASO 1.94 0 11.25 0 23MTB 1.94 0 11.25 0 46TTP 1.94 0 11.25 0 07OMH 1.94 0 11.25 0 29LFW 1.94 0 11.25 0 38YLS 1.94 0.12 11.25 0.48 32JGF 2.07 0 11.73 0 17FEC 2.07 0 11.73 0 45BBB 2.07 0 11.73 0 50LJH 2.07 0 11.73 0 36LTT 2.07 0 11.73 0 11OMM 2.07 0 11.73 0 25WOW 2.07 0 11.73 0 18HTP 2.07 0 11.73 0 10JAJ 2.07 0.12 11.73 0.45 33BFP 2.19 0 12.18 0 41OKC 2.19 0 12.18 0 02TLP 2.19 0 12.18 0 26SBS 2.19 0 12.18 1 16PPG 2.19 0 13.18 -1 27CBC 2.19 0 12.18 0 30HDD 2.19 0.12 12.18 -0.5 01TBM 2.32 0 11.60 0 43HHD 2.32 0 11.60 1 13RRS 2.32 0 12.60 -1 21LAU 2.32 0 11.60 1 42BBC 2.32 0.25 12.60 0.73 15PCD 2.58 -2.5 13.33 carve off 12OWF outlier. 26SBS 0.00 0.64 0.00 4.58 33BFP 0.65 0.12 4.58 0.81 35SSS 0.77 0 5.40 1 22HLH 0.77 0 6.40 0 32JGF 0.77 0.12 6.40 -0.2 14ASO 0.90 0 6.18 0 07OMH 0.90 0 6.18 0 08JSC 0.90 0 6.18 0 42BBC 0.90 0 6.18 0 23MTB 0.90 0 6.18 0 25WOW 0.90 0 6.18 0 27CBC 0.90 0 6.18 0 05HDS 0.90 0 6.18 1 47CCM 0.90 0 7.18 -1 02TLP 0.90 0 6.18 0 21LAU 0.90 0 6.18 0 03DDD 0.90 0.12 6.18 0.75 28BBB 1.03 0 6.93 0 15PCD 1.03 0 6.93 0 36LTT 1.03 0 6.93 0 29LFW 1.03 0 6.93 0 50LJH 1.03 0 6.93 0 48OTB 1.03 0 6.93 0 38YLS 1.03 0 6.93 0 41OKC 1.03 0 6.93 0 45BBB 1.03 0.12 6.93 0.71 01TBM 1.16 0 7.65 0 13RRS 1.16 0 7.65 1 43HHD 1.16 0 8.65 -1 49WLG 1.16 0.12 7.65 -0.3 44HLH 1.29 0 7.33 2 30HDD 1.29 0 9.33 -1 17FEC 1.29 0.12 8.33 0.65 10JAJ 1.42 0.12 8.98 0.61 39LCS 1.55 0 9.60 -2 37MBB 1.55 0.77 7.60 5 04LMM 2.32 12.60 carve 26SBS 04LMM 25WOW 0.13 0.51 -0.02 4.6 30HDD 0.65 0.12 4.58 0.81 17FEC 0.77 0 5.40 0 41OKC 0.77 0 5.40 -1 35SSS 0.77 0.12 4.40 1.78 44HLH 0.90 0 6.18 0 33BFP 0.90 0 6.18 0 47CCM 0.90 0 6.18 0 04LMM 0.90 0 6.18 0 02TLP 0.90 0 6.18 0 32JGF 0.90 0 6.18 0 21LAU 0.90 0 6.18 1 07OMH 0.90 0 7.18 -1 01TBM 0.90 0 6.18 0 28BBB 0.90 0 6.18 0 03DDD 0.90 0.12 6.18 0.75 36LTT 1.03 0 6.93 0 43HHD 1.03 0 6.93 1 26SBS 1.03 0 7.93 -1 37MBB 1.03 0 6.93 0 23MTB 1.03 0 6.93 1 10JAJ 1.03 0 7.93 -1 38YLS 1.03 0 6.93 0 14ASO 1.03 0 6.93 0 05HDS 1.03 0 6.93 0 08JSC 1.03 0.12 6.93 0.71 13RRS 1.16 0 7.65 0 50LJH 1.16 0 7.65 0 39LCS 1.16 0 7.65 0 27CBC 1.16 0.12 7.65 0.68 49WLG 1.29 0 8.33 0 42BBC 1.29 0 8.33 0 48OTB 1.29 0.12 8.33 0.65 29LFW 1.42 0.90 8.98 3.61 45BBB 2.32 12.60 carve 45BBB 25WOW royalty? mistakes mistakes!!! Anyway it just appears to be carving outliers and everything is eventually an outlier!

  6. FAUST Thanksgiving clusterer(ffa to ffffa (outliers in red)) p=ffa=21LAU q=ffffa=39LCS F Gap R Gap 0.00 1.1094 0.00 5.76 04LMM 1.11 0 5.77 0 27CBC 1.11 0.2773 5.77 3.30 48OTB 1.39 0 9.08 -5 09HBD 1.39 0.2773 4.08 3.15 26SBS 1.66 0 7.23 -2 12OWF 1.66 0 5.23 1 01TBM 1.66 0 6.23 0 23MTB 1.66 0 6.23 0 37MBB 1.66 0 6.23 -1 46TTP 1.66 0 5.23 1 33BFP 1.66 0 6.23 0 28BBB 1.66 0 6.23 0 02TLP 1.66 0 6.23 0 38YLS 1.66 0 6.23 0 10JAJ 1.66 0 6.23 0 21LAU 1.66 0 6.23 -1 41OKC 1.66 0.2773 5.23 3 14ASO 1.94 0 8.23 -4 50LJH 1.94 0 4.23 2 08JSC 1.94 0 6.23 -1 45BBB 1.94 0 5.23 1 25WOW 1.94 0 6.23 0 29LFW 1.94 0 6.23 2 13RRS 1.94 0 8.23 -1 35SSS 1.94 1.6641 7.23 -7.2 47CCM 3.61 -0.00 *****42BBC p=ffa=39LCS q=ffffa=07OMH F Gap R Gap 0.00 1.3363 0.00 6.21 23MTB 1.34 0.2672 6.21 -1.7 28BBB 1.60 0 4.43 4 02TLP 1.60 0 8.43 -2 10JAJ 1.60 0 6.43 2 47CCM 1.60 0 8.43 -2 09HBD 1.60 0.2672 6.43 0.07 38YLS 1.87 0 6.50 -1 48OTB 1.87 0 5.50 2 04LMM 1.87 0 7.50 -1 33BFP 1.87 0 6.50 0 42BBC 1.87 0 6.50 -1 26SBS 1.87 0 5.50 1 25WOW 1.87 0 6.50 1 12OWF 1.87 0 7.50 -2 08JSC 1.87 0 5.50 1 27CBC 1.87 0.2672 6.50 1.92 50LJH 2.14 0 8.43 -4 14ASO 2.14 0 4.43 1 29LFW 2.14 0 5.43 2 45BBB 2.14 0 7.43 -2 13RRS 2.14 0 5.43 0 21LAU 2.14 0 5.43 0 39LCS 2.14 0 5.43 -1 35SSS 2.14 0 4.43 0 46TTP 2.14 0.2672 4.43 1.78 06SPP 2.41 0 6.21 -2 01TBM 2.41 1.3363 4.21 -4.2 41OKC 3.74 -0.00 37MBB p=ffa=35SSS q=ffffa=26SBS F Gap R Gap 0.00 2.4596 0.00 9.95 35SSS 2.46 0.2236 9.95 -0.1 07OMH 2.68 0 9.80 -1 21LAU 2.68 0 8.80 -2 41OKC 2.68 0 6.80 0 48OTB 2.68 0 6.80 0 05HDS 2.68 0 6.80 1 50LJH 2.68 0 7.80 -2 37MBB 2.68 0.2236 5.80 2.75 42BBC 2.91 0 8.55 0 01TBM 2.91 0 8.55 0 14ASO 2.91 0 8.55 -2 46TTP 2.91 0 6.55 0 23MTB 2.91 0 6.55 2 06SPP 2.91 0 8.55 -1 28BBB 2.91 0 7.55 -1 33BFP 2.91 0 6.55 3 02TLP 2.91 0 9.55 0 13RRS 2.91 0 9.55 -5 10JAJ 2.91 0 4.55 4 04LMM 2.91 0 8.55 -2 47CCM 2.91 0 6.55 -1 25WOW 2.91 0 5.55 4 38YLS 2.91 0.2236 9.55 -2.3 39LCS 3.13 0 7.20 -1 29LFW 3.13 0 6.20 0 09HBD 3.13 0 6.20 -2 12OWF 3.13 0 4.20 2 08JSC 3.13 0 6.20 0 27CBC 3.13 1.3416 6.20 -6.2 45BBB 4.47 -4.472 -0.00 0 26SBS There came an old woman from France who taught grown children to Three blind mice! See how they run! They all ran after the farme How many miles is it to Babylon? Three score miles and ten. Can I Here we go round the mulberry bush, the mulberry bush, the mulber Tom Tom the pipers son, stole a pig and away he run. The pig was Buttons, a farthing a pair! Come, who will buy them of me? They a Baa baa black sheep, have you any wool? Yes sir yes sir, three ba This little pig went to market. This little pig stayed at home. If I had as much money as I could tell, I never would cry young l Jack and Jill went up the hill to fetch a pail of water. Jack fe The Lion and the Unicorn were fighting for the crown. The Lion be Old King Cole was a merry old soul. And a merry old soul was he. If all the seas were one sea, what a great sea that would be! And Little Jack Horner sat in the corner, eating of Christmas pie. He Jack Sprat could eat no fat. His wife could eat no lean. And so b Bye baby bunting. Father has gone hunting. Mother has gone milkin There was an old woman, and what do you think? She lived upon not When little Fred went to bed, he always said his prayers. He kiss A robin and a robins son once went to town to buy a bun. They cou Sing a song of sixpence a pocket full of rye. Four and twenty bla

  7. C1 C11 01TBM Three blind mice 02TLP This little pig went to market 03DDD Diddle diddle dumpling my son John 04LMM Little Miss Muffet 06SPP See a pin and pick it up 10JAJ Jack and Jill went up the hill 13RRS A robin and a robins son 14ASO If all the seas were one sea 16PPG Flour of England 17FEC Here sits the Lord Mayor 18HTP I had two pigeons bright and gay 23MTB How many miles is it to Babylon 25WOW There was an old woman 32JGF Jack, come give me your fiddle 33BFP Buttons, a farthing a pair 43HHD Hark hark, the dogs do bark 44HLH The hart he loves the high wood 46TTP Tom Tom the pipers son 47CCM Cocks crow in the morn 49WLG There was a little girl SO11 SO10 SO9 SO8 50LJH Little Jack Horner 39LCS A little cock sparrow 21LAU The Lion and the Unicorn 42BBC Bat bat, come undert 3 C12 C111 09HBD Hush baby. Daddy is near 12OWF There came old woman France 26SBS Sleep baby sleep 27CBC Cry baby cry 29LFW When little Fred went to bed 45BBB Bye baby bunting 03DDD Diddle diddle dumpling my son John 06SPP See a pin and pick it up 10JAJ Jack and Jill went up the hill 13RRS A robin and a robins son 14ASO If all the seas were one sea 16PPG Flour of England 18HTP I had two pigeons bright and gay 23MTB How many miles is it to Babylon 25WOW There was an old woman 32JGF Jack, come give me your fiddle 33BFP Buttons, a farthing a pair 43HHD Hark hark, the dogs do bark 44HLH The hart he loves the high wood 47CCM Cocks crow in the morn 49WLG There was a little girl SO6 SO7 DO3 SO3 SO15 SO14 SO4 SO5 TO1 TO2 01TBM Three blind mice 17FEC Here sits the Lord Mayor 46TTP Tom Tom pipers 12OWF The came ol woman France 13RRS A robin and 14ASO If all the seas 25WOW There was an old woman 44HLH The hart he loves 47CCM Cocks crow in the morn 29LFW When little Fred went bed 23MTB How many miles to Babylon 33BFP Buttons, a farthing a pair 43HHD Hark hark, the dogs do bark 10JAJ Jack and Jill went up the hill 03DDD Diddle diddle dumpling C121 ffa=29 09HBD Hush baby. Daddy is near 26SBS Sleep baby sleep 27CBC Cry baby cry 45BBB Bye baby bunting SO12 11OMM One misty moisty SO13 37MBB Here we go rnd mulberry SO1 MG44d60w A-FFA dendogram 01TBM Three blind mice 02TLP This little pig went to market 03DDD Diddle diddle dumpling my son John 04LMM Little Miss Muffet 06SPP See a pin and pick it up 08JSC Jack Sprat could eat no fat 09HBD Hush baby. Daddy is near 10JAJ Jack and Jill went up the hill 12OWF There came an old woman from France 13RRS A robin and a robins son 14ASO If all the seas were one sea 16PPG Flour of England 17FEC Here sits the Lord Mayor 18HTP I had two pigeons bright and gay 23MTB How many miles is it to Babylon 25WOW There was an old woman 26SBS Sleep baby sleep 27CBC Cry baby cry 29LFW When little Fred went to bed 32JGF Jack, come give me your fiddle 33BFP Buttons, a farthing a pair 43HHD Hark hark, the dogs do bark 44HLH The hart he loves the high wood 45BBB Bye baby bunting 46TTP Tom Tom the pipers son 47CCM Cocks crow in the morn 49WLG There was a little girl 35SSS Sing a song of sixpence .18 .14 DO1 07OMH Old Mother Hubbard 30HDD Hey diddle diddle C2 ffa=39LCS 05HDS Humpty Dumpty 11OMM One misty moisty morning 15PCD Great A. little a 21LAU The Lion and the Unicorn 22HLH I had a little husband 28BBB Baa baa black sheep 36LTT Little Tommy Tittlemouse 37MBB Here we go round mulberry bush 38YLS If I had as much money as I could tell 39LCS A little cock sparrow 41OKC Old King Cole 42BBC Bat bat, come under my hat 48OTB One two, buckle my shoe 50LJH Little Jack Horner C21 ffa=21LAU 05HDS Humpty Dumpty 11OMM One misty moisty morning 15PCD Great A. little a 21LAU The Lion and the Unicorn 22HLH I had a little husband 36LTT Little Tommy Tittlemouse 37MBB Here we go round mulberry 38YLS If I had as much money 42BBC Bat bat, come under my hat 48OTB One two, buckle my shoe .26 DO4 .46 .19 28BBB Baa baa black sheep 41OKC Old King Cole 2.2 .28 SO2 08JSC Jack Sprat .42 C211 ffa=37 .41 05HDS Humpty Dumpty 11OMM One misty moisty morning 15PCD Great A. little a 22HLH I had a little husband 36LTT Little Tommy Tittlemouse 37MBB Here we go round mulberry 38YLS If I had as much money 48OTB One two, buckle my shoe 2 .31 1.3 1.53 .42 DO2 .38 02TLP This little pig 04LMM Little Miss Muffet C2111 ffa=15 .1.6 .36 05HDS Humpty Dumpty 15PCD Great A. little a 22HLH I had a little husband 36LTT Little Tommy Tittlemouse 38YLS If I had as much money 48OTB One two, buckle my shoe C1111 1.8 03DDD Diddle diddle dumpling my son John 06SPP See a pin and pick it up 13RRS A robin and a robins son 16PPG Flour of England 18HTP I had two pigeons bright and gay 23MTB How many miles is it to Babylon 32JGF Jack, come give me your fiddle 33BFP Buttons, a farthing a pair 43HHD Hark hark, the dogs do bark 47CCM Cocks crow in the morn 49WLG There was a little girl C2111 seems to be lullabys? no gaps C2111 seems to focus on extremes? (big and small) .3 1.3 C11111 03DDD Diddle diddle dumpling my son John 06SPP See a pin and pick it up 16PPG Flour of England 18HTP I had two pigeons bright and gay 32JGF Jack, come give me your fiddle 47CCM Cocks crow in the morn 49WLG There was a little girl Notes: In text mining, just about any document is eventually going to be an outlier due to the fact that we are projecting high dimension (44 here) onto dimension=1. Thus the ffa will almost always be an outlier in LAvgffa. C111111 06SPP See a pin and pick it up 16PPG Flour of England 18HTP I had two pigeons bright and gay 32JGF Jack, come give me your fiddle 49WLG There was a little girl

  8. Thanksgiving Clustering MG44d60w using word similarity (q is similar to p if q shares many of p's words) Select p, then count the number of p-words the others have in common with p (I'm selecting randomly from max wc). If that number (call it "x=p") >= Thr*wc(p), include that other word in the p-cluster. Carve off each p-cluster as it is created and repeat. Thr= ¼ 35SSS Sing a song of sixpence 26SBS Sleep baby sleep 28BBB Baa baa black sheep 41OKC Old King Cole 08JSC Jack Sprat 22HLH I had a little husband 07OMH Old Mother Hubbard went to cupboard to give her poor dog a bone. When she got there cupboard was bare and so poor dog had none. She went to baker to buy him some bread. When she came back dog was dead 13RRS A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again 45BBB Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. Cluster Theme=Someone went out to buy something 39LCS little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew 50LJH Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I Cluster Theme: boy and Little??? 21LAU Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town10JAJ Jack and Jill went up hill to fetch pail of water. Jack fell down, broke his crown Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. ???? (the words "crown" and "brown" but it's used in a different context) 46TTP Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street 04LMM Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away 30HDD Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon (all share "away". 2 shre "eat" "run") 14ASO If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be 01TBM Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice05HDS Humpty Dumpty 11OMM One misty moisty morning when cloudy was weather, I chanced to meet a old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 17FEC Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin 32JGF Jack come give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I 'e had 36LTT Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches weak cuting off theme??? 29LFW When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs 03DDD Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John 09HBD Hush baby. Daddy is near. Mamma is a lady and that is very clear 27CBC Cry baby cry. Put your finger in your eye and tell your mother it was not I 47CCM Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise Sleep theme? 37MBB Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning 02TLP This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home 16PPG Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring 33BFP Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair ????? 44LIH hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will 15PCD Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho 18HTP I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know high/low theme??? 12OWF There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France 25WOW There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet (old woman theme) 48OTB One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty 23MTB How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light numbers??? 06SPP See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day 49WLG There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid ???? 38YLS If I had as much money 42BBC Bat bat, come under my hat 43HHD Hark hark, the dogs do bark p=21LAU Lion And Unicorn 1 shares ~ ¼*6 2 pwds p=26SBS Sleep Baby Sleep 0 share ~ ¼*7 2 pwds p=35SSS Sing a Song of Sixpence 0 share  ½*wc(p) =½*13= 4 pwds p=39LCS Little cock sparrow 1 shares ~ ¼*7 2 pwds p=29LFW Lil Fred went to bed 4 share ~ ¼*4 1 pwds p=41OKC Old King Cole 0 share ~ ¼*5 2 pwds p=07OMH Old Mother Hubbard 2 share ~ ¼*7 2 pwds p=28BBB Baa baa black sheep 0 share ~ ¼*6 2 pwds p=46TTP Tom tom piper's son 2 share ~ ¼*6 2 pwds p=14ASO All seas were one sea 6 share ~ ¼*4 1 pwds p=37MBB go round Mulberry bush 3 share ~ ¼*4 1 pwds p=44HLH Hart loves hi wood 2 share ~ ¼*4 1 pwds p=12OWF Old woman f framce 1 shares ~ ¼*3 1 pwds p=48OTB One Two Buckle my shoe 1 shares ~ ¼*3 1 pwds p=06SPP See a pin. pick it up 1 shares ~ ¼*2 1 pwds

  9. FAUST LSR classifier, recursive LR on IRIS X=X(X1..Xn,C); oultiers are O=O(O1...On,OC)X init ; OT=Outlier_Thres; Carve off outliers from X into O (O:=O{x|Rankn-1DNN(x,X)OT; O:=O{x|D2NN(X,x)>OT ).... 3.13 6.57 1.77 10.3 0.8 7.8 19 33 2.51 9.10 1.67 2.96 2.53 8.03 2.13 19.8 1 2 66 154 31 11 15 12 6 14 then SAvI then on [3.13,6.57], p=avI, 30.7 45.7 13.7 13.7 3.2 9.6 1.5 6.0 3 2 Ldd=e1p=orig S43 58 E 49 70 I 56 79 R on L-1[49,56) w p=avS R on L-1[56,58] w p=avS R on L-1(58,70] w p=avE R on R-1(4.38,9.97] p=avI R on R-1(1.6,2.2] p=avI .92 9.97 4.38 18.2 2.2 15 1.6 6.6 2.27 2.27 1.66 2.18 2.9 3.5 26 34 40 43 1 2 3 13 5 15 31 11 3 13 5 1010 1010 26 29 10 d=e4 d=e3 10 19 30 51 48 69 1 6 10 18 15 25 R on L-1[22,29) p=avE Ldd=e2 p=orig S 29 44 E20 34 I 22 38 R R-1[9.69,12.8) p=avI now SAvE does it R Le2-1[29,34] p=avS R on R-1(39.82,42.56] p=avE R Le2-1(34,38] p=avI .87 12.8 9.69 35.7 2.38 22.9 2.45 4.51 0.57 2.70 8.12 10.2 55.83 66.82 2.061 2.061 0.948 6.647 25.07 42.56 39.82 61.22 15 2 3 2 69 28 23 26 15 2 6 1 26 16 19 32 We set OT=5 and carve off outliers: i7, i9, i10, i35, i36, i39, s42 There is TP purity except in C1=Le4-1[15,18]. We will apply R on each interval, then restrict to X = C1 and use R: R on [15,18], p=avE then on [2.5,8.1], p=avI, then SAvI Note: using Le4=L(0001)=X4 only, the method gives 100% True Positive accuracy. That would seem to be good news for text classification, for example, where there are thousands of columns and a need to reduce that number significantly. Of course, eliminating False Positives is also important but we think R will be effective in doing that (more effective than L) even in high dimensions. These are conjectures only, of course, and require careful further study. Next, we check to see if Le3 is as efficient wrt TP accuracy. We also note that the R numbers here are Sqrt(R) (i.e., actual radial reach, not radial reach squared). In practice, using pTree algebra, one would probably use radial reach squared since square root is a difficult pTree computation (would have to estimate using truncated Taylor series? Very ugly!). 0 0 14 17 R on [48,51], p=avE Using Le3=L(0010)=X3 only, the method gives 100% True Positive accuracy also. Next we check the other two for TP accuracy, Le1 and Le2, then the question will be "How should one choose the low number of columns to use in the text mining case, when there are 10,000, not just 4?" Interestingly, each column, by itself, gives 100% TP accuracy using L,R and S (L once, then R mostly - S was used only when it got down to just 1 or 2 in a class). Next let's see if it works for text. I will take MG44d60w, classify as many docs as I can as an expert, then use the others as a test set. Once I have a LSR Hull model built from the training set, I will classify each of the test set documents using the FAUST Multiclass LSR hull Classifier which I built. I will then look at the test doc classifications and assess how good they are (basically to see if the method reveals affinities that make sense and that I hadn't notice when I put together the "expert training set" in the first place - therefore answering the question "Has new info been uncovered thru SML?")

  10. 2.62 AvgDNN 2.44 MedDNN UDR(C2) L CT GP 17 1 1 18 4 1 19 4 1 20 4 1 21 2 1 22 3 1 23 2 1 24 3 2 26 4 1 27 4 1 28 2 1 29 2 1 30 1 1 31 2 1 32 1 1 33 1 no gap or PCCs Construct L=Ld where d=am/|am| = 0.58 0.01 0.80 0.08UDR(LX) gap 1 2 4 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 count 1 1 1 3 1 3 2 4 4 2 2 3 5 3 4 7 4 2 2 3 4 1 value 0 1 3 7 8 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 1 1 1 1 1 1 1 2 2 5 3 2 3 1 1 1 2 1 2 28 29 30 31 32 33 34 35 37 39 UDR(C3) L CT GP 0 2 2 2 1 1 3 1 1 4 2 1 5 2 1 6 1 4 10 1 1 11 2 2 13 1 1 14 1 APPENDIX DNNS (top) 16.0 i39 7.34 i7 6.32 i10 6.24 s42 5.56 i9 5.38 i36 5.38 i35 4.89 i15 4.89 e13 4.58 s23 4.35 i20 4.24 e15 4.24 i1 4.12 i32 4.12 i19 4.12 i18 Thanksgiving clustering (carve off clusters as one would carve a thanksgiving turkey) Let m be a furthest point from aAvgX (i.e., pt in X that maximizes SPTS, Sa=(X-a)o(X-a) ) If m is an outlier, carve {m} off from X. Repeat until m is a non-outlier. Construct L=Ld where d=am/|am| Carve off L-gapped cluster(s). Pick centroid, cc=mean of slice, SL: A. If (PCC2=PCD1) declare L-1[PCC1,PCC2] to be a cluster and carve it off (mask it off) of X; else (PCC2=PCI2 ) SLL-1[(3PCC1+PCC2)/4 ,PCC2) and look for a Scc or Rd,cc gap. If one is found, declare it to be a cluster and carve it off of X; Else add cc to the Cluster Centroid Set, CCS. B. Do A. from the high side of L also. Repeat until no new clusters carve off. If X (not completely carved up) use pkmeans on the remains with initial centroids, CCS One can also continue to carve using other vectors (e.g., mimj using pillars), before going to pkmeans. IRIS: Carve off outliers i39 i7 i10 s42 i9 i36 i35 i15 e13 s23 i20 e15 i1 i32 i19 i18 m = i23 = 77 28 67 20 is furthest point from AvgX = 57.89 3 0.70 36.18 11.38) Construct L=Ld where d=am/|am|= -0.12 -0.17 0.811 0.545UDR[L(X)] gap 1 1 1 1 1 1 1 15 3 2 2 1 1 1 1 1 1 1 1 2 1 1 count 1 1 6 10 14 10 2 4 1 3 2 3 3 5 6 2 6 7 5 1 4 3 L+3_val 0 1 2 3 4 5 6 722 25 27 29 30 31 32 33 34 35 36 37 39 40 Carve off cluster C1=L-1[0,7] (s=48) No PCCs remain (except 1st and last) so add Avg(X) to CCS={(57.8 30.7 36.1 11.3)} 1 1 1 1 1 1 1 1 1 3 1 2 3 5 5 4 1 4 4 4 1 1 1 41 42 43 44 45 46 47 48 49 50 53 54

  11. Start here Ldd=e1=PL p=origin S43 58 E 49 70 I 56 79 Ldd=e3=SL p=origin S 10 19 E 30 51 I 48 69 Ldd=e4=SW p=origin S 1 6 E 10 18 I 15 25 Ldd=e2=PW p=origin S 29 44 E 20 34 I 22 38 49 44 6 14 30 10 15 31 11 3 13 5 26 29 49 35 15 12 32 1 2616 28 2326 152 6 Ldd=e1+e2 p=origin E 61.5 70.7 I57.9 64.4 Ldd=e1-e2 p=origin E 19 26.9 I 20.5 25.5 Ldd=-e1+e2 p=origin E 17 I 13.4 16.3 5 6 3 5 1 5 5 1 5 6 5 6 FAUST LSR classifier on IRIS X=X(X1,...,Xn,C); oultiers are O=O(O1,...,On,OC)X initially empty; OT=Outlier_Threshold; Carve off outliers from X into O (O:=O{x|Rankn-1Dis(x,X)OT; O:=O{x|Rankn-2Dis(X,x)>OT );...; Define class hulls: Hk{zRn | minFd,p,k  Fd,p,k(z)  maxFd,p,k (d,p)dpSet, F=L,S,R} If y is in just one hull, declare y to be in that class. Elseif y is in multiple hulls, MH, declare y the Ck that minimizes dis(y,Ck), kMH (note dis(y,Ck)=dis(y,X)&Ck). Else (y is in no hulls), if dis(y,O)min{dis(y,o)|oO}=dis(y,oi)<OT, declare y to be in the class of oi else declare y to be other. Notes: 1. This algorithm deals with singleton outliers but ignores doubleton and tripleton outliers etc. 2. In Elseif, rather than compute dis(y,Ck) (single link distance) one could use dis(y,meanCk) for the pre-computed class means. We create hull boundaries for the d=ek=(0..1at k..0) standard basis and check for overlaps. Then the goal is to reduce False Positives. We set OT=5 and carve off outliers: i7, i9, i10, i35, i36, i39, s42 e3 69 31 49 15 e21 59 32 48 18 e23 63 25 49 15 e28 67 30 50 17 e34 60 27 51 16 i20 60 22 50 15 i24 63 27 49 18 i27 62 28 48 18 i28 61 30 49 18 i34 63 28 51 15 i50 59 30 51 18 Note that these last 3 are applied recursively. We get 100% TP accuracy quickly with recursion (building hull trees). This isn't what the algorithm says to do so we need to make this "LSR Hull Tree" algorithm precise. First, let's see how FAUST-L along as described does wrt TPs.

  12. m1 d maxL = pcd2L pcd1L pci2L minL = pci1L m4 Finding the Pillars of X(So, e.g., the k can be chosen intelligently in k-means) :Let m1 be a point in X that maximizes the SPTS, dis2(X,a)=(X-a)o(X-a) where aAvgX If m1 is an outlier (Check using Sm1or better using D2NN?), repeat until m1 is a non-outlier. AvX1 A point, m1, found in this manner is called a non-outlier pillar of X wrt a, or nop(X,a) ) Let m2  nop(X,m1) In general, if non-outlier pillars m1..mi-1 have been chosen, choose mi from nop(X,{m1,...,mi-1}) (i.e., mi maximizes k=1..i-1dis2(X,mk)and is a non-outlier). (Instead of using Smi or D2NN to eliminate outliers each round, one might get better pillars by constructing Lmi-1mi:XR, eliminating outliers that show up on L, then picking the pillar to be the mean (or vector of medians) of the slice L-1[(3PCC1+PCC2)/4 , PCC2) ? ) m3 m2 A PCC Pillar pkmeans clusterer: Assign each (object, class) a ClassWeightReals (all CW init at 0) Classes numbered as they are revealed. As we are identifying pillar mj's, compute Lmj= Xo(mj-mj-1) and 1. For the next larger PCI in Ld(C), left-to-right. 1.1a If followed by PCD, CkAvg(Ld-1[PCI,PCD]) (or VoM). If Ck is center of a sphere-gap (or barrel gap), declare Classk and mask off. 1.1b If followed by another PCI, declare next Classk=the sphere-gapped set around Ck=Avg( Ld-1[ (3PCI1+PCI2)/4,PCI2) ). Mask it off. 2. For the next smaller PCD in Ld from the left side. 2.1a If preceded by a PCI, declare next Classk= subset of Ld-1[PCI, PCD] sphere-gapped around Ck=Avg. Mask off. 2.1b If preceded by another PCD declare next Classk=subset of same, sphere-gapped around Ck=Avg(Ld-1( [PCD2,(PCD1+PCD2)/4] ). Mask off @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ A potential advantage of the classifier: FAUST Linear-Spherical-Radial(LSR) The parallel part lets us build a pair of L,S,R hull segments for every pTree computation (the more the merrier) Serial part allows possibility of better hull than ConvexHull E.g., in a linear step, if we not only use min and max but also PCIs and PCDs, potentially we could do the following on class=@: On each PCC interval (ill-defined but here [pci1L,pcd1L] (pcd1L,pci2L) [pci2L,pcd2L] Build hull segments on each interval and OR them? Whereas the convex hull in orange (lots of false positives)

  13. 2.62 AvgDNN 2.44 MedianDNN 2.44 i28 2.44 i42 2.44 i49 2.44 i47 2.44 i41 2.44 i46 2.44 e18 2.44 e9 2.44 s14 2.23 i21 2.23 i48 2.23 i11 2.23 s7 2.23 e24 2.23 i44 2.23 s12 2.23 s36 2.23 s44 2 s24 2 e35 2 e29 2 s43 2 s27 2 e17 2 e48 2 e40 2 s26 2 e4 2 e25 1.73 i13 1.73 i27 1.73 i24 1.73 s38 1.73 e45 1.73 i40 1.73 e39 1.73 e20 1.73 s35 1.73 s10 1.41 e26 1.41 s41 1.41 e50 1.41 s47 1.41 s4 1.41 e44 1.41 s46 1.41 e42 1.41 e47 1.41 e33 1.41 e46 1.41 e31 1.41 s5 1.41 e16 1.41 s39 1.41 e8 1.41 s31 1.41 s3 1.41 s30 1.41 s13 1.41 s29 1.41 i17 1.41 i38 1.41 s2 1.41 s28 1.41 s50 1.41 s22 1.41 e43 1.41 s20 1.41 e14 1.41 e32 1.41 s48 1.41 s9 1 s40 1 i33 1 i29 1 s18 1 s11 1 s8 1 s49 1 s1 3.87 e10 3.60 e11 3 e12 4.89 e13 1.41 e14 4.24 e15 1.41 e16 2 e17 2.44 e18 2.64 e19 1.73 e20 3 e21 3.31 e22 3.60 e23 2.23 e24 2 e25 1.41 e26 3.16 e27 3.16 e28 2 e29 2 e40 2.64 e41 1.41 e42 1.41 e43 1.41 e44 1.73 e45 1.41 e46 1.41 e47 2 e48 3.87 e49 1.41 e50 4.24 i1 2.64 i2 3.87 i3 2.44 i4 3 i5 2.64 i6 7.34 i7 2.64 i8 5.56 i9 6.32 i10 2.23 i11 3.46 i12 1.73 i13 2.64 i14 4.89 i15 3 i16 1.41 i17 4.12 i18 4.12 i19 4.35 i20 2.23 i21 3.16 i22 2.64 i23 1.73 i24 3 i25 3.46 i26 1.73 i27 2.44 i28 1 i29 3.46 i30 2.64 i31 4.12 i32 1 i33 3.31 i34 5.38 i36 2.44 i37 1.41 i38 16.0 i39 1.73 i40 2.44 i41 2.44 i42 2.64 i43 2.23 i44 2.44 i45 2.44 i46 2.44 i47 2.23 i48 2.44 i49 2.82 i50 DNN = 1 s1 1.41 s2 1.41 s3 1.41 s4 1.41 s5 3.31 s6 2.23 s7 1 s8 1.41 s9 1.73 s10 1 s11 2.23 s12 1.41 s13 2.44 s14 4.12 s15 3.60 s16 3.46 s17 1 s18 3.31 s19 1.41 s20 2.82 s21 1.41 s22 4.58 s23 2 s24 3 s25 2 s26 2 s27 1.41 s28 1.41 s29 1.41 s30 1.41 s31 2.82 s32 3.46 s33 3.46 s34 1.73 s35 2.23 s36 3 s37 1.73 s38 1.41 s39 1 s40 1.41 s41 6.24 s42 2 s43 2.23 s44 3.60 s45 1.41 s46 1.41 s47 1.41 s48 1 s49 1.41 s50 2.64 e1 2.64 e2 2.64 e3 2 e4 2.44 e5 3 e6 2.64 e7 1.41 e8 2.44 e9 DNNS = Distance to Nearest Neighbor Sorted 16.0 i39 7.34 i7 6.32 i10 6.24 s42 5.56 i9 5.38 i36 5.38 i35 4.89 i15 4.89 e13 4.58 s23 4.35 i20 4.24 e15 4.24 i1 4.12 i32 4.12 i19 4.12 i18 4.12 s15 3.87 e49 3.87 e10 3.87 i3 3.74 e36 3.60 s16 3.60 s45 3.60 e11 3.60 e23 3.46 e30 3.46 s33 3.46 s34 3.46 i12 3.46 i26 3.46 i30 3.46 s17 3.31 s19 3.31 i34 3.31 e22 3.31 s6 3.31 e34 3.16 e27 3.16 i22 3.16 e28 3 e6 3 s37 3 i5 3 e21 3 i16 3 s25 3 e12 3 i25 2.82 e37 2.82 s32 2.82 i50 2.82 s21 2.64 e41 2.64 i31 2.64 i43 2.64 e2 2.64 i2 2.64 i8 2.64 e38 2.64 i23 2.64 e3 2.64 i14 2.64 e7 2.64 e19 2.64 i6 2.64 e1 2.44 i37 2.44 e5 2.44 i4 2.44 i45 outlier slider DNN or D2NN or D2NNS are powerful constructs REMEMBER! 1. The pTree Rule: Never throw a pTree away! 2. In the process of creating D2NN we create, for each xX, the mask pTree of all nearest neighbors of x (all those points that tie as being nearest to x), which BTW, in high dimension is likely to be a large number. This is useful information (reason #1: no ties, maybe that one point is also an outlier? or?) In RANKk(x) pTree code, you may be able to see how we can compute all RANKk(x)s (all k) in parallel with efficiency (sharing sub-procedures). DNNS (top portion) 16.0 i39 GAP 7.34 i7 8.68 6.32 i10 1.02 6.24 s42 0.07 5.56 i9 0.67 5.38 i36 0.18 5.38 i35 0 4.89 i15 0.48 4.89 e13 0 4.58 s23 0.31 4.35 i20 0.22 4.24 e15 0.11 4.24 i1 0 4.12 i32 0.11 4.12 i19 0 4.12 i18 0 If not, we can (serially) mask off the ties and apply RANKn-1 again to get RANKn-2 ( those points that are next nearest neighbors to x. I believe this has value too, e.g., if DNN(x)=1 and y is the only point in that mask of points distance=1 from x, and DNN(y)=1 and x is the only point distance=1 from y, then if RANKn-2(x)>outlier threshold+1, {x,y} is a doubleton outlier. With a little more work, tripleton and quadrupleton outliers can be identified, etc. At some point we have to stop and call the set a "small cluster" rather than an outlier polyton. If we construct tables, RANKk(x, Rkn-1Dis(x), PtrToRkn-1Mask(x),...,Rkn-kDis(x), PtrToRkn-kMask(x) ), we have a lot of global information about our dataset. It is a version of the "neighbor" network that is studied so actively for social networks, etc. (i.e., Rankn-1Mask(X) is a bit map of the edges emanating from x in the "nearest neighbors" network. Task: Construct a theory of large networks (or engineering handbook) using pTrees to identify edges (nearest nbrs). Rkn-2Mask(x) gives all pts "straight line distance" second closest to x, which we don't get in standard network theory. If y is 2 hops from x, we know y is a nearest nbr of a nearest nbr of x . We don't know how far away it is. Next we suggest that the Rkk calculations may be more efficiently done using UDR in one fell swoop. Why? 1. the UDR provides all of them. 2. UDR takes care of the duplicate problem (e.g., if looking for Nearest Nbr, it may not be Rankn-1 due to duplicates). 3. In the process of building UDR we get the Distribution Tree, which has lots of useful approximation information. We note that we still have to build DNN, D2NN, D2NNS one row at a time.

  14. RankKval= 0 1 0 0 0 0 0 23 * + 22 * + 21 * + 20 * = 5P=MapRankKPts= ListRankKPts={2} Computing the Rank values and Rank pTrees, one at a time, using our pTree code. (n=3) c=Count(P&P4,3)= 3 < 6 p=6–3=3; P=P&P’4,3 masks off highest 3 (val 8) {0} X P4,3P4,2P4,1 P4,0 0 1 1 1 0 0 0 0 1 0 1 1 1 1 10 5 6 7 11 9 3 1 0 0 0 1 1 0 1 0 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 3 >= 3 P=P&P4,2 masks off lowest 1 (val 4) {1} (n=1) c=Count(P&P4,1)=2 < 3 p=3-2=1; P=P&P'4,1 masks off highest 2 (val8-2=6 ) {0} {1} (n=0) c=Count(P&P4,0 )=1 >= 1 P=P&P4,0 RankKval=0; p=K; c=0; P=Pure1; /*Note: n=bitwidth-1. The RankK Points are returned as the resulting pTree, P*/ For i=n to 0 {c=Count(P&Pi); If (c>=p) {RankVal=RankVal+2i; P=P&Pi}; else {p=p-c;P=P&P'i }; return RankKval, P; /* Above K=7-1=6 (looking for the Rank6 or 6th highest vaue (which is also the 2nd lowest value) */ {0} {1} {0} {1}

  15. What if there ar duplicates? (n=3) c=Count(P&P4,3)= 3 < 6 p=6–3=3; P=P&P’4,3 masks off 1s {0} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 0 1 1 1 1 10 3 6 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 2 < 3 p=3-2=1 P=P&P'4,2 masks off 1s {0} (n=1) c=Count(P&P4,1)=2 >= 1 P=P&P 4,1 masks off 0s (none) {1} {1} (n=0) c=Count(P&P4,0 )=2 >= 1 P=P&P4,0 (n=3) c=Count(P&P4,3)= 3 < 5 p=5–3=2; P=P&P’4,3 masks off 1s {0} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 10 3 7 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 2 >= 2 P=P&P4,2 masks off 0s {1} (n=1) c=Count(P&P4,1)=2 >= 2 P=P&P 4,1 masks off 0s (none) {1} {1} (n=0) c=Count(P&P4,0 )=2 >= 2 P=P&P4,0

  16. (n=3) c=Count(P&P4,3)= 3 < 4 p=4–3=1; P=P&P’4,3 masks off 1s {0} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 10 3 7 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 2 >= 1 P=P&P4,2 masks off 0s {1} (n=1) c=Count(P&P4,1)=2 >= 1 P=P&P 4,1 masks off 0s (none) {1} {1} (n=0) c=Count(P&P4,0 )=2 >= 1 P=P&P4,0 (n=3) c=Count(P&P4,3)= 3 >= 3 P=P&P4,3 masks off 0s {1} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 10 3 7 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 0 < 3 p-3-0=3 P=P&P'4,2 masks off 1s (none ) {0} (n=1) c=Count(P&P4,1)=2 < 3 p=3-2=1 P=P&P' 4,1 masks off 1s {0} {1} (n=0) c=Count(P&P4,0 )=1 >= 1 P=P&P4,0

  17. (n=3) c=Count(P&P4,3)= 3 >= 2 P=P&P4,3 masks off 0s {1} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 10 3 7 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 0 < 2 p-2-0=2 P=P&P'4,2 masks off 1s (none ) {0} (n=1) c=Count(P&P4,1)=2 >=2 P=P&P 4,1 masks off 0s {1} {0} (n=0) c=Count(P&P4,0 )=1 < 2 P=P&P4,0mask off 1s (n=3) c=Count(P&P4,3)= 3 >= 1 P=P&P4,3 masks off 0s {1} X P4,3P4,2P4,1 P4,0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 10 3 7 7 11 9 3 1 0 0 0 1 1 0 1 1 1 1 1 0 1 (n=2) c=Count(P&P4,2)= 0 < 1 p-1-0=1 P=P&P'4,2 masks off 1s (none ) {0} (n=1) c=Count(P&P4,1)=2 >=1 P=P&P 4,1 masks off 0s {1} {1} (n=0) c=Count(P&P4,0 )=1 <= 1 P=P&P'4,0mask off 0s So what we get is really the same output as the UDR but it seems more expensive to calculate. Unless all we need is Rank(n-1), but then we won't know for sure that there are no duplicates. We have to check Rank(n-2), Rank(n-3), ... until we see a non-duplicate.

  18. applied to S, a column of numbers in bistlice format (an SpTS), will produce the DistributionTree of S DT(S) depth=h=0 15 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 5/64 [0,64) p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 2/32[64,96) p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1[32,48) p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 3/32[0,32) p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 2[96,112) p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0[64,80) p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1/16[0,16) p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 6[112,128) p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 1[48,64) p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 1[16,24) p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 2/16[16,32) p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 2[80,96) p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 2/32[32,64) p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 ¼[96,128) p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 1[48,56) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 1[24,32) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0[56,64) p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 0[0,8) p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 1[32,40) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 1[8,16) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0[40,48) p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 10/64 [64,128) p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 2[80,88) p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 3[112,120) p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0[88,96) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 3[120,128) p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 0[96,104) p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 2[194,112) UDR Univariate Distribution Revealer (on Spaeth:) 5 10 depth=h=1 node2,3 [96.128) yofM 11 27 23 34 53 80 118 114 125 114 110 121 109 125 83 p6 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 p5 0 0 0 1 1 0 1 1 1 1 1 1 1 1 0 p4 0 1 1 0 1 1 1 1 1 1 0 1 0 1 1 p3 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 p2 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 p1 1 1 1 1 0 0 1 1 0 1 1 0 0 0 1 p0 1 1 1 0 1 0 0 0 1 0 0 1 1 1 1 p6' 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 p5' 1 1 1 0 0 1 0 0 0 0 0 0 0 0 1 p4' 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 p3' 0 0 1 1 1 1 1 1 0 1 0 0 0 0 1 p2' 1 1 0 1 0 1 0 1 0 1 0 1 0 0 1 p1' 0 0 0 0 1 1 0 0 1 0 0 1 1 1 0 p0' 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 Y y1 y2 y1 1 1 y2 3 1 y3 2 2 y4 3 3 y5 6 2 y6 9 3 y7 15 1 y8 14 2 y9 15 3 ya 13 4 pb 10 9 yc 11 10 yd 9 11 ye 11 11 yf 7 8 3 2 2 8 f= 1 2 1 1 0 2 2 6 0 1 1 1 1 0 1 000 2 0 0 2 3 3 depthDT(S)b≡BitWidth(S) h=depth of a node k=node offset Nodeh,k has a ptr to pTree{xS | F(x)[k2b-h+1, (k+1)2b-h+1)} and its 1count Pre-compute and enter into the ToC, all DT(Yk) plus those for selected Linear Functionals (e.g., d=main diagonals, ModeVector . Suggestion: In our pTree-base, every pTree (basic, mask,...) should be referenced in ToC( pTree, pTreeLocationPointer, pTreeOneCount ).and these OneCts should be repeated everywhere (e.g., in every DT). The reason is that these OneCts help us in selecting the pertinent pTrees to access - and in fact are often all we need to know about the pTree to get the answers we are after.).

  19. Ld d=1000 p=origin S43 59 E 49 71 I 49 80 Ld d=0010 p=origin S 10 19 E 30 51.1 I 18 69 Ld d=0001 p=origin S 1 6 E 10 18.1 I 14 26 Ld d=0100 p=origin S 23 44 E 20 34.1 I 22 38.1 16 34 246 26 32 12 i2 58 27 51 19 i7 49 25 45 17 i11 65 32 51 20 i14 57 25 50 20 i15 58 28 51 24 i20 60 22 50 15 i22 56 28 49 20 i24 63 27 49 18 i27 62 28 48 18 i28 61 30 49 18 i34 63 28 51 15 i42 69 31 51 23 i43 58 27 51 19 i47 63 25 50 19 i50 59 30 51 18 6 of the 16 occur. 48 0 50 15 34 21 50 28 22 16 34 1 21 29 4746 153 6 6 of the 6 occur. i4 63 29 56 18 i9 67 25 58 18 i17 65 30 55 18 i20 60 22 50 15 i24 63 27 49 18 i27 62 28 48 18 i28 61 30 49 18 i34 63 28 51 15 i35 61 26 56 14 i38 64 31 55 18 i39 60 30 18 18 i50 59 30 51 18 5 of the 6 occur. i4 63 29 56 18 i7 49 25 45 17 i8 73 29 63 18 i9 67 25 58 18 i17 65 30 55 18 i20 60 22 50 15 i24 63 27 49 18 i26 72 32 60 18 i27 62 28 48 18 i28 61 30 49 18 i30 72 30 58 16 i34 63 28 51 15 i35 61 26 56 14 i38 64 31 55 18 i39 60 30 18 18 i1 63 33 60 25 i7 49 25 45 17 i14 57 25 50 20 i15 58 28 51 24 i22 56 28 49 20 i43 58 27 51 19 1 of the 6 occurs, i7. FAUST Oblique LSR Classification on IRIS150 We create hull boundaries for the d=ek standard basis vectors and check for overlaps. Then the only goals is to reduce False Positives. What does this tell us about FAUST LSR on IRIS? The LSR hulls are 96% True Positive accurate on IRIS using only the pre-computed min and max of each given column, PL, PW, SL, SW as cut points (no further pTree calculations beyond the attribute min and max pre-calculations). That's pretty good! Note i7 and i20 are prominent outlies (see IRIS_DNNS on slide 4) so if we had eliminated outliers first using DNNS, the TPaccuracy is 97.3% Next we address False Positives. How does one measure FP accuracy? One way would be to measure the area of Hull-Class for each Class. That would give us a FP accuracy for each Class. The sum of those would give us an FP accuracy for the model. These areas are difficult numbers to calculate however, for many reasons. First, what do we mean by Class? The mathematical convex hull of the Class? How do we calculate area? An easier way would be to measure up a large set of IRIS samples, none of which are Setosa, Versicolor or Virginica. The problem with this approach is that other varieties may well share some or all measurements with S, E and I, so we would not expect to be able to separate them into an "other" class using the data we have. So a vector space based FP assessment might be preferable. Since area of the symmetric difference is hard to calculate, how about measuring the maximum distance from any hull corner to it's closest class point (or the sum of those distances?)? Easier, use max distance to the main corners only. That's easy enough for strictly linear hulls but what about hulls that have S and R components? Since the above is a linear hull, we use it. The main corners are: MIN VECTOR MAX VECTOR MnVecDis MxVecDis s 43 23 10 1 59 44 19 6 4.1 4.9 e 49 20 30 10 71 34 51 18 4.4 5.1 i 49 22 18 14 80 38 69 26 14.2 5.4 The sum of the distances to class corner vectors is 38.1, average is 6.4.

  20. FAUST LSR Classification on IRIS150, a new version 16 34 246 26 32 12 Ld d=avgI-avgE p=origin E 5.74 15.9 I 13.6 16.6 24,0 2,4 0,1 0 99 393 1096 1217 1826 p=AvE 270 792 26 5 1558 2568 Ld d=avgI-avgE p=origin E 1.78 I 6.26 1,0 0,1 p=AvgE 22.69 31.021 1 35.51 54.32 L1000,origin(y) [43,49)[49,58](58,70](70,79]else OTHER yS yI R1000,AvgE(y) [0,99][399,1096][1217,1826]else OTHER R1000,AvgE(y) [270,792)[792,1558](1558,2568]else OTHER yI yE yS yE yI LAvEAvI,origin(y) [5.7,13.6)[13.6,15.9](15.9,16.6]else OTHER yE yI RAvEAvI,AvgE(y) [22.7,31)[31,35.52](35.52,54.32]else OTHER yE yI 1. If you're classifying individual unclassified samples one at a time, applying these formulas gives 100% accuracy in terms of true positives (assuming the given training set fully characterizes the classes). We have used just d=1000 so many more edges could be placed on these hulls to eliminate false positives. Ld d=1000 p=origin MinL, MaxL for classesS,E,I S43 58 E 49 70 I 49 79 2. If there is a whole table of unclassified samples to be classified (e.g., millions or billions) then it might be time-cost effective to convert that table to a pTreeSet and then convert these inequalities to pTree inequalities (EIN Ring technology) to accomplish the classification as one batch process (no loop required). p=AvgS 50 34 15 2 This is the {y isa EI)2recursive step pseudo code: if 270  R1000,AvgS(y) < 792 {y isa I} elseif 792  R1000,AvgS(y)  1558 {y isa EI}3 elseif 1558  R1000,AvgS(y)  2568 {y isa I} else {y isa O} This is the {y isa EI}3 recursive step: if 5.7  LAvE-AvI(y) < 13.6 {y isa E } elseif 13.6  LAvE-AvI(y)  15.9 {y isa EI}4 elseif 15.9 < LAvE-AvI(y)  16.6 {y isa I} else {y isa O } if 43  L1000(y)=y1 < 49 {y isa S } elseif 49  L1000(y)=y1  58 {y isa SEI}1 elseif 59 < L1000(y)=y1  70 {y isa EI}2 elseif 70 < L1000(y)=y1  79 {y isa I} else {y isa O } This is the {y isa EI)4recursive step pseudo code: if 22.69 RAvE-AvI,AvgE(y)<31.02 {y isa E } elseif 31.02 RAvE-AvI,AvgE(y)35.51 {y isa EI}5 elseif 35.51 RAvE-AvI,AvgE(y)54.32 {y isa I} else {y isa O } This is the {y isa SEI)1recursive step pseudo code: if 0  R1000,AvgS(y)  99 {y isa S } elseif 99 < R1000,AvgS(y) < 393 {y isa O } elseif 393 < R1000,AvgS(y)  1096 {y isa E } elseif 1096 < R1000,AvgS(y) < 1217 {y isa O } elseif 1217  R1000,AvgS(y)  1826 {y isa I} else {y isa O } This is the {y isa EI}5 recursive step: if 1.78=LAvgE-AvgI,origin(y) {y isa E } elseif 6.26=LAvgE-AvgI,origin(y) {y isa I} else {y isa O } LSR Decision Tree algorithm is, Build decision tree for each ek (also for some ek combos?). Build branches to 100% TP (no class duplication exiting). Then y isa C iff y isa C in every tree else y isa Other. node build a branch for each pair of classes in each interval. LAvEAvI,origin(y)= 1.78 yE 6.26 yI else OTHER

  21. FAUST LSR DT Classification on IRIS150, d= 0100 Instead of calculating R's wrt a freshly calculated Avg in each slice, we calculate R0100,AvgS R0100,AvgE R0100,AvgI once then & w mask, P20L0100,0rigin<22 L 0100.Origin(y) S 23 44 E20 34 I 22 38 and later & with masks, P22L0100,0rigin<23 , P23L0100,0rigin34 , P34<L0100,0rigin38 and P38<L0100,0rigin44 29 47 46 1 2 1 15 3 6 On 34<L0100,O38 R0100,AvgI 1776 2746 96273 On 34<L0100,O38 R0100,AvgS 0 55 31393849 On 22L0100,O<23 R0100,AvgE 1518 5859 On 23L0100,O34 R0100,AvgS 0 66 3101750 3524104 46,12 On 23L0100,O34 R0100,AvgE 793 1417 3234 581103 On 23L0100,O34 R0100,AvgI 1892 1824 36929 231403 On 23L0100,O34 & 352R0100,AvgS1750 LAvgEAvgI,Origin 5379 5077 44,11 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 RAvgEAvgI,AvgI 075.2 2.8134 40,10 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 RAvgEAvgI,AvgE 075.2 2.8134 40,10 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 LAvgEAvgI,Origin 53.776.2 74.177 7,7 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 & 15.4RAvgEAvgI,AvgE57.3 & 74.2LAvgEAvgI,Origin75.6 RAvgEAvgI,AvgE 1537 57 6,00,1 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 & 15.4RAvgEAvgI,AvgE57.3 LAvgEAvgI,Origin 74.275.6 74.175.9 6,1 On 23L0100,O34 & 352R0100,AvgS1750 & 53LAvgEAvgI,Origin77 & 2.8RAvgEAvgI,AvgE75.2 & 74.1LAvgEAvgI,Origin76.2 RAvgEAvgI,AvgE 15.475.2 257.3 6,4 It takes 7 recursive rounds to separate E and I (build this branch to 100% TP) in this branch of the e2=0100 tree 0100 (Pedal Width). It seems clear we are mostly pealing off outliers a few at a time. Is it because we are not revising the Avg Vectors as we go (to get the best angle)? On the next slide we make a fresh calculation of Avg for each subcluster. It also appears to be unnecessary to position the starting point of the AvgEAvgI vector to both AvgE and AvgI

  22. FAUST LSR DT Classification on IRIS150 L 0100.Origin(y) S 23 44 E20 34 I 22 38 L 0010.Origin(y) S10 19 E 30 51 I 18 69 L 0100.Origin(y) S 23 44 E20 34 I 22 38 29 47 46 29 47 46 2 1 1 1 2 1 2 1 50 15 15 3 15 3 6 6 On 23L0100,O34 R0100,AvgS 0 43 3201820 3994213 45,12 On 23L0100,O34 R0100,AvgE 793 1417 3 234 581103 13,21 On 30L0010,O51 R0010,AvgE 2.8 157.6 16.3199 33,14 On 23L0100,O34 & 58R0100,AvgS234 LAvgEAvgI,Origin 5279 6683 7,13 On 23L0100,O34 & 58R0100,AvgS234 SBarrelAvgE 68241.1 58.6272 13,18 On 23L0100,O34 & 320R0100,AvgS1820 LAvgEAvgI,Origin 2334 2530 24,9 23L0100,O34 & 58R0100,AvgS234 SBarrelAvgI 36.1951 5.9417 7,14 On 30L0010,O51 & 16.3R0100,AvgS157.6 LAvEAvI,O 52.778.4 66.380 19,13 On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 SLinearAvgE 266.1 27.8357 1,5 On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 SLinearAvgI 7.4171 2.4135 6,11 On 30L0100,O51 & 16.3R0100,AvgS157.6 & 66.3LAvEAvI,O78.4 RAvgEAcI,AvE 1416 2522.2 936.4 1748.2 5,6 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 RAvgEAvgI,AvgE 088 4108 24,6 On 30L0100,O51 & 16.3R0100,AvgS157.6 & 66.3LAvEAvI,O78.4 & 1416RAvgEAcI,AvE1449 & L 1416 2522.2 936.4 1748.2 5,6 On 23L0100,O34 & 58R0100,AvgS234 & 25LAvEAvI,O32 & 27SLinearAvgE66.1 Sp 0 11 23 1,00,5 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 LAvgEAvgI,Origin 31334961 34114397 18,5 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 & 3411LAvgEAvgI,Origin4397 & 5.9RAvgEAvgI,AvgE20.5 LAvgEAvgI,Origin 3854.4 5457 1,1 On 23L0100,O34 & 320R0100,AvgS1820 & 25LAvgEAvgI,Origin30 & 4RAvgEAvgI,AvgE88 & 3411LAvgEAvgI,Origin4397 RAvgEAvgI,AvgE 126 5.920.5 10,5 On this slide we do the same as on the last but make a fresh calculation of Avg for each recursive steps. It takes 7 recursive rounds again to separate E and I in this branch of the e2=0100 tree 0100 (Pedal Width). From this incomplete testing, it seems not to be beneficial to make expensive fresh Avg calculations. We pause the algorithm and try SBarrelAvgE and SBarrelAvgI in addition to LAvEAvI,O Next try inserting SLinearAvgE and SLinearAvgI in serial w LAvEAvI,O instead of parallel. Seems very beneficial! Use only LinearAvg with the smallest count, in this case LinearAvgE?

  23. 16 34 246 26 32 12 48 0 50 15 34 50 28 22 16 34 21 1 21 29 4746 153 6 0 99 393 1096 1217 1826 4954 482422 11 8134 9809 0 66 310 35246 12 1750 4104 270 792 26 5 1558 2568 0 55 3139 3850 FAUST Oblique LSR Classification IRIS150 Ld d=1000 p=origin S43 59 E 49 71 I 49 80 Ld d=0010 p=origin S 10 19 E 30 51 I 18 69 Ld d=0001 p=origin S 1 6 E 10 18 I 14 26 Ld d=0100 p=origin S 23 44 E 20 34.1 I 22 38 3000 331547 14 6120 6251 p=AvgS 50 34 15 2 0 279 5 171 186 748 998 1 517, 4 79 633 2.8 1633 14 158 199 5 3617 7 152 611 3 5813 21 234 793 110321 3 1417 712 636 9, 3 983 1369 p=AvgE 59 28 43 13 24 126 2 1 132 730 1622 2281 0 342610 388 1369 5.9 1146 14 319 453 0 2522 12 454 1397 5 36 47 23 1403 929 1892 2824 96 273 1776 2747 p=AvgI 66 30 55 20 In pTree psuedo-code: Py<43=PO P43y<49=PS P49y58=PSEI P59<y70=PEI P70<y79=PI PO:= PO or Py>70

  24. Row         Attr1      Attr2 1               0             0 2               0             25 3               0             50 4               75          75 5               0             100 6               0             125 7               0             150 7 6 X Row      Attr1       Attr2 1             0            0 2             0            100 3             0            0 4             110        110 5             0            114 6             0            123 7             0            145 8             0            0 5 7 103.078 25 100 4 6 5 3 4 2 2 1 1, 3, 8 Ld,p=(X-p)od (if p=origin we use Ld=Xod) is a distance dominated functional, meaning dis(Ld,p(x),Ld,p(y))  dis(x, y) x,yX. Therefore there is no conflict between Ld,p gap enclosed clusters for different d's. I.e., consecutive Ld,p gaps a separate cluster always (but not necessarily vice versa). A PCI followed by a PCD  a separate cluster (with nesting issues to be resolved!). Recursion solves problems, e.g., gap isolating point4 is revealed by a Le1(X)=Attr1 gap. Recursively restricting to {123 5678} and applying Le2(X)=Attr2 reveals the 2 other gaps This first example suggests that recursion can be important. A different example suggests that recursion order can also be important: Using ordering, d=e2, e1 recursively, Le2=Attr2 reveals no gaps, so Le1=Attr1 is applied to all of X and reveals only the gap around point4. Using ordering d=e1, e2 instead: Le1=Attr1 on X reveals a gap of at least 100 around point4 (actual gap: 103.078) StD: ~30 ~55 Note StD doesn't always reveal best order! Le2=Attr2 is applied to X-{4} reveals a gap of 50 between {123} and {567} also. What about the other functionals? Sp=(X-p)o(X-p) and Rd,p=Sp-L2d,p In an attempt to be more careful, we can only say that Sp (and therefore also Rd,p) is eventually distance dominated meaning dis(Sp(x), Sp(y))dis(x, y) provided 1dis(p,x)+dis(p,y) Letting r=dis(p,x)=Sp(x), s=dis(p,y)=Sp(y) and r>s, then r-s  dis(x,y) and dis(Sp(x),Sp(y)) = r2-s2 = (r-s)*(r+s)  dis(x,y)*[dis(p,x)+dis(p,y)] When does FAUST Gap suffice for clustering? For text mining?

  25. o=origin; pRn; dRn, |d|=1; {Ck}k=1..K are the classes; An operation enclosed in a parallelogram, , means it is a pTree op, not a scalar operation (on just numeric operands) Lp,d  (X - p) o d = Lo,d - [pod] minLp,d,k = min[Lp,d & Ck] maxLp,d,k = max[Lp,d & Ck[ = [minLo,d,k]- pod = [maxLo,d,k] - pod = min(Xod & Ck)- pod = max(Xod & Ck) - podOR = min(X&Ck) o d- pod = max(X&Ck) o d - pod Sp = (X - p)o(X - p) = -2Xop+So+pop = Lo,-2p + (So+pop) minSp,k=minSp&Ck maxSp,k = maxSp&Ck = min[(X o (-2p) &Ck)]+ (XoX+pop) =max[(X o (-2p) &Ck)] + (XoX+pop) OR= min[(X&Ck)o-2p]+ (XoX+pop) =max[(X&Ck)o-2p] + (XoX+pop) Rp,d  Sp, - Lp,d2 minRp,d,k=min[Rp,d&Ck] maxRp,d,k=max[Rp,d&Ck] LSR IRIS150-. Consider all 3 functionals, L, S and R. What's the most efficient way to calculate all 3?\ I suggest that we use each of the functionals with each of the pairs, (p,d) that we select for application (since, to get R we need to compute L and S anyway). So it would make sense to develop an optimal (minimum work and time) procedure to create L, S and R for any (p,d) in the set.

  26. C13 C8,1: D=0110 Ch,1: D=10-10 Ca,1: D=0011 Cg,1: D=1-100 Cf,1: D=1111 Ce,1: D=0111 C5,1: D=1100 C6,1: D=1010 C9,1: D=0101 C7,1: D=1001 C2,3: D=0100 Cb,1: D=1110 C3,3: D=0010 Cc,1: D=1101 C4,1: D=0001 Cd,1: D=1011 C1,1: D=1000 55 169 y isa O if yoD(-,55)(169,) L H y isa O|S if yoD Ce,1  [55,169] 81 182 y isa O if yoD(-,81)(182,) L H y isa O|S if yoD Cc,1  [81,182] 68 117 y isa O if yoD(-,68)(117,) L H y isa O|S if yoD C6,1  [68,117] 3 46 y isa O if yoD(-,3)(46,) L H y isa O|S if yoD Ci,1  [3,46] 10 22 y isa O if yoD(-,10)(22,) L H y isa O|S if yoD Ch,1  [10,22] 84 204 y isa O if yoD(-,84)(204,) L H y isa O|S if yoD Cg,1  [84,204] 39 127 y isa O if yoD(-,39)(127,) L H y isa O|S if yoD Cf,1  [39,127] 71 137 y isa O if yoD(-,71)(137,) L H y isa O|S if yoD Cd,1  [71,137] 10 19 y isa O if yoD(-,10)(19,) L H y isa O|S if yoD C4,1  [10,19] 1 6 y isa O if yoD(-,1)(6,) L H y isa O|S if yoD C5,1  [1,6] 23 44 y isa O if yoD(-,23)(44,) L H y isa O|S if yoD C3,3  [23,44] 54 146 y isa O if yoD(-,54)(146,) L H y isa O|S if yoD C7,1  [54,146] 12 91 y isa O if yoD(-,12)(91,) L H y isa O|S if yoD Cb,1  [12,91] 26 61 y isa O if yoD(-,26)(61,) L H y isa O|S if yoD Ca,1  [26,61] 36 105 y isa O if yoD(-,36)(105,) L H y isa O|S if yoD C9,1  [36,105] 44 100 y isa O if yoD(-,44)(100,) L H y isa O|S if yoD C8,1  [44,100] 43 58 y isa O if yoD(-,43)(58,) L H y isa O|S if yoD C2,3  [43,58] 400 1000 1500 2000 2500 3000 LSR on IRIS150 y isa OTHER if yoDse (-,495)(802,1061)(2725,) Dse 9 -6 27 10 495 802 S 1270 2010 E 1061 2725 I L H y isa OTHER or S if yoDse  C1,1 [ 495 , 802] y isa OTHER or I if yoDse  C1,2  [1061 ,1270] y isa OTHER or E or I if yoDse  C1,3  [1270 ,2010 C1,3: 0 s 49 e 11 i y isa OTHER or I if yoDse  C1,4  [2010 ,2725] Dei -3 -2 3 3 -117 -44 E y isa O if yoDei (-,-117)(-3,) -62 -3 I y isa O or E or I if yoDei  C2,1  [-62 ,-44] L H y isa O or I if yoDei  C2,2  [-44 , -3] C2,1: 2 e 4 i Dei 6 -2 3 1 420 459 E y isa O if yoDei (-,420)(459,480)(501,) 480 501 I y isa O or E if yoDei  C3,1  [420 ,459] L H y isa O or I if yoDei  C3,2  [480 ,501] Continue this on clusters with OTHER + one class, so the hull fits tightely (reducing false positives), using diagonals? The amount of work yet to be done., even for only 4 attributes, is immense.. For each D, we should fit boundaries for each class, not just one class. For 4 attributes, I count 77 diagonals*3 classes = 231 cases. How many in the Enron email case with 10,000 columns? Too many for sure!! D, not only cut at minCoD, maxCoD but also limit the radial reach for each class (barrel analytics)? Note, limiting the radial reach limits all other directions [other than the D direction] in one step and therefore by the same amount. I.e., it limits all directions assuming perfectly round clusters). Think about Enron, some words (columns) have high count and others have low count. Our radial reach threshold would be based on the highest count and therefore admit many false positives. We can cluster directions (words) by count and limit radial reach differently for different clusters??

  27. Dot Product SPTS computation:XoD = k=1..nXkDk D2,0 D2,1 D1,0 D1,1 D X1*X2 = (21 p1,1 +20 p1,0) (21 p2,1 +20 p2,0) = 22 p1,1 p2,1 +21( p1,1 p2,0+ p2,1 p1,0) + 20 p1,0 p2,0 1 1 3 3 1 1 pXoD,1 pXoD,0 pXoD,3 pXoD,2 X X1 X2 p11 p10 p21 p20 XoD 0 1 1 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 0 1 0 0 1 0 0 6 9 9 0 1 1 0 1 1 1 3 2 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 & & & & 0 1 1 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 0 1 0 pX1*X2,0 0 1 0 pX1*X2,1 0 1 0 0 1 0 X X1 X2 pX1*X2,2 pX1*X2,3 p11 p10 p21 p20 X1*X2 D2,0 D2,1 D1,0 D1,1 D ( ( = 22 = 22 1 p1,1 1 p1,1 + 1 p2,1 ) + 1 p2,1 )   + 1 p2,0 ) + 1 p2,0 ) + 21 (1 p1,0 + 21 (1 p1,0 + 1 p11 + 1 p11 + 20 (1 p1,0 + 20 (1 p1,0 + 1 p2,0 + 1 p2,0 + 1 p2,1 ) + 1 p2,1 ) 1 3 2 1 3 1 0 1 1 1 1 0 0 1 0 1 1 1 1 9 2 1 1 0 0 0 0 0 1 0 1 0 1 1 0 3 3 1 1 & & 0 1 0 0 0 0 0 0 1 CAR12,3 1 1 0 0 1 0 0 0 1 0 1 0 1 0 1 CAR11,2 0 0 0 1 0 0 CAR10,1  CAR22,3    & pX1*X2,1 pX1*X2,2 pX1*X2,3 pX1*X2,0 & & & CAR21,2 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 1 0 0 CAR13,4 PXoD,0 PXoD,3 PXoD,2 PXoD,1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 PXoD,4 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 Different data. CAR10,1 pTrees XoD 0 0 1 X 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 3 2 1 3 1 0 1 1 1 1 0 0 1 0 1 1 1 6 18 9 PXoD,0 PXoD,2 PXoD,1        PXoD,3        1 1 0 1 1 0 1 1 1 1 0 1 & & & & & & & & & & & & & & /*Calc PXoD,i after PXoD,i-1 CarrySet=CARi-1,i RawSet=RSi */ INPUT: CARi-1,i, RSi ROUTINE: PXoD,i=RSiCARi-1,i CARi,i+1=RSi&CARi-1,i OUTPUT: PXoD,i, CARi,i+1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 0 1 0 We have extended the Galois field, GF(2)={0,1}, XOR=add, AND=mult to pTrees. SPTS multiplication: (Note, pTree multiplication = &)

  28. Example: FAUST Oblique: XoD used in CCC, TKO, PLC and LARC) and (x-X)o(x-X) p1 p1 p1 p,0 p,0 p,0 p3 p3 p3 p2 p2 p2 X X1 X2 p11 p10 p21 p20 XoD XoD XoD = -2Xox+xox+XoX is used in TKO. 0 0 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 3 9 2 2 3 3 3 6 5 0 1 0 0 0 0 0 0 0 1 0 1 1 1 0 0 1 1 1 3 2 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 n=1 p=2 n=0 p=2 P &p0 P=p0&P P p1 P=P&p1 D2,0 D2,0 D2,0 D2,1 D2,1 D2,1 D1,0 D1,0 D1,0 D1,1 D1,1 D1,1 D=x2 D=x1 D=x3 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 32 1*21+ 22 1*21+1*20=3 so -2x1oX = -6 0 1 0 1 0 0 2 1 1 1 3 0 0 1 1 1 1 0 n=3 p=2 n=2 p=1 n=1 p=1 n=0 p=1 P &p2 P=p'2&P P p3 P &p1 P &p0 P=P&p'3 P=p1&P P=p0&P RankN-1(XoD)=Rank2(XoD) 0 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 0 1 1<2 2-1=1 0*23+ 0<1 1-0=1 0*23+0*22 21 0*23+0*22+1*21+ 11 0*23+0*22+1*21+1*20=3 so -2x2oX= -6 RankN-1(XoD)=Rank2(XoD) n=2 p=2 n=1 p=2 n=0 p=1 P=p'1&P P &p1 P=P&p2 P=p0&P P p2 P &p0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 22 1*22+ 1<2 2-1=1 1*22+0*21 11 1*22+0*21+1*20=5 so -2x3oX= -10 So in FAUST, we need to construct lots of SPTSs of the type, X dotted with a fixed vector, a costly pTree calculation (Note that XoX is costly too, but it is a 1-time calculation (a pre-calculation?). xox is calculated for each individual x but it's a scalar calculation and just a read-off of a row of XoX, once XoX is calculated.. Thus, we should optimize the living be__ out of the XoD calculation!!! The methods on the previous seem efficient. Is there a better method? Then for TKO we need to computer ranks: RankK: p is what's left of K yet to be counted, initially p=K V is the RankKvalue, initially 0. For i=bitwidth+1 to 0 if Count(P&Pi)  p { KVal=KVal+2i; P=P&Pi}; else /* < p */ { p=p-Count(P&Pi);P=P&P'i }; RankN-1(XoD)=Rank2(XoD)

  29. D2,0 D2,1 D1,0 D1,1 D So let us look at ways of doing the work to calculate As we recall from the below, the task is to ADD bitslices giving a result bitslice and a set of carry bitslices to carry forward XoD = k=1..nXk*Dk 1 1 3 3 1 1 ( ( = 22 = 22 1 p1,1 1 p1,1 + 1 p2,1 ) + 1 p2,1 ) ( ( ( ( + 1 p2,0 ) + 1 p2,0 ) 1 p1,0 1 p1,0 + 1 p11 + 1 p11 1 p1,0 1 p1,0 + 21 + 21 + 1 p2,0 + 1 p2,0 + 1 p2,1 ) + 1 p2,1 ) + 20 + 20 pTrees XoD X 1 0 0 1 0 0 0 1 1 0 1 1 1 3 2 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 6 9 9 0 1 1 0 1 1 1 1 0 1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 0 0 I believe we add by successive XORs and the carry set is the raw set with one 1-bit turned off iff the sum at that bit is a 1-bit Or we can characterize the carry as the raw set minus the result (always carry forward a set of pTrees plus one negative one). We want a routine that constructs the result pTree from a positive set of pTrees plus a negative set always consisting of 1 pTree. The routine is: successive XORs across the positive set then XOR with the negative set pTree (because the successive pset XOR gives us the odd values and if you subtract one pTree, the 1-bits of it change odd to even and vice versa.): /*For PXoD,i (after PXoD,i-1). CarrySetPos=CSPi-1,i CarrySetNeg=CSNi-1,i RawSet=RSi CSP-1=CSN-1=*/ INPUT: CSPi-1, CSNi-1, RSi ROUTINE: PXoD,i=RSiCSPi-1,iCSNi-1,i CSNi,i+1=CSNi-1,iPXoD,i; CSPi,i+1=CSPi-1,iRSi-1; OUTPUT: PXoD,i, CSNi,i+1 CSPi,i+1 0 1 1 0 1 1 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 CSN-1.0PXoD,0 CSP-1,0RS0 RS1 CSN0,1= CSP0,1= CSP-1,0=CSN-1,0= RS0 PXoD,0 PXoD,1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 0 1 0 1  =      = 1 0 1 0 0 0 0 1 1 1 0 1 0 0 0 

  30. D2,0 D2,0 D2,1 D2,1 D1,0 D1,0 D1,1 D1,1 D D XoD = k=1..nXk*Dk 1 1 1 0 3 3 1 2 1 1 0 1 k=1..n ( = 22B Dk,B pk,B k=1..n ( Dk,B pk,B-1 + Dk,B-1 pk,B + 22B-1 k=1..n ( Dk,B pk,B-2 + Dk,B-1 pk,B-1 + Dk,B-2 pk,B + 22B-2 Xk*Dk = Dkb2bpk,b XoD=k=1,2Xk*Dk with pTrees: qN..q0, N=22B+roof(log2n)+2B+1 k=1..n ( +Dk,B-3 pk,B Dk,B pk,B-3 + Dk,B-1 pk,B-2 + Dk,B-2 pk,B-1 + 22B-3 = Dk(2Bpk,B +..+20pk,0) = (2BDk,B+..+20Dk,0) (2Bpk,B +..+20pk,0) . . . k=1..2 ( = 2BDkpk,B +..+ 20Dkpk,0 = 22 Dk,1 pk,1 k=1..n ( Dk,Bpk,B) = 22B( +Dk,Bpk,B-1) + 22B-1(Dk,B-1pk,B Dk,B pk,0 + Dk,2 pk,1 + Dk,1 pk,2 +Dk,0 pk,3 + 23 +..+20Dk,0pk,0 k=1..2 ( Dk,1 pk,0 + Dk,0 pk,1 + 21 pTrees k=1..n ( X Dk,2 pk,0 + Dk,1 pk,1 + Dk,0 pk,2 + 22 B=1 1 3 2 1 0 1 0 1 1 1 1 0 0 0 0 1 0 1 k=1..2 ( k=1..n ( Dk,0 pk,0 Dk,1 pk,0 + Dk,0 pk,1 + 20 + 21 q0 = p1,0 = no carry 1 1 0 k=1..n ( Dk,0 pk,0 + 20 ( ( = 22 = 22 1 p1,1 D1,1p1,1 + 1 p2,1 ) + D2,1p2,1 ) ( ( ( ( + 1 p2,0 ) + D2,0p2,0) D1,1p1,0 1 p1,0 + 1 p11 + D1,0p11 1 p1,0 D1,0p1,0 + 21 + 21 + 1 p2,0 + D2,1p2,0 + 1 p2,1 ) + D2,0p2,1) + 20 + 20 q1= carry1= 1 1 0 0 0 1 ( = 22 D1,1 p1,1 + D2,1 p2,1 ) ( ( + D2,0 p2,0) D1,1 p1,0 +D1,0 p11 D1,0 p1,0 + 21 + D2,1 p2,0 +D2,0 p2,1) + 20 0 0 0 q2=carry1= no carry 0 1 1 1 0 1 1 1 0 0 0 1 q0 = carry0= 0 1 1 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 1 1 0 1 0 1 1 0 1 1 1 1 2 1 1 q1=carry0+raw1= carry1= 1 1 1 1 1 1 q2=carry1+raw2= carry2= 1 1 1 q3=carry2 = carry3= A carryTree is a valueTree or vTree, as is the rawTree at each level (rawTree = valueTree before carry is incl.). In what form is it best to carry the carryTree over? (for speediest of processing?) 1. multiple pTrees added at next level? (since the pTrees at the next level are in that form and need to be added) 2. carryTree as a SPTS, s1? (next level rawTree=SPTS, s2, then s10& s20 = qnext_level and carrynext_level ? CCC ClustererIf DT (and/or DUT) not exceeded at C, partition C further by cutting at each gap and PCC in CoD For a table X(X1...Xn), the SPTS, Xk*Dk is the column of numbers, xk*Dk. XoD is the sum of those SPTSs, k=1..nXk*Dk So, DotProduct involves just multi-operand pTree addition. (no SPTSs and no multiplications) Engineering shortcut tricka would be huge!!!

  31. Question: Which primitives are needed and how do we compute them? X(X1...Xn) D2NN yields a 1.a-type outlier detector (top k objects, x, dissimilarity from X-{x}). D2NN = each min[D2NN(x)] (x-X)o(x-X)= k=1..n(xk-Xk)(xk-Xk)=k=1..n(b=B..02bxk,b-2bpk,b)( (b=B..02bxk,b-2bpk,b) ----ak,b--- b=B..02b(xk,b-pk,b) ) ( 22Bak,Bak,B + =k=1..n( b=B..02b(xk,b-pk,b) )( 22B-1( ak,Bak,B-1 + ak,B-1ak,B ) + { 22Bak,Bak,B-1 } =k (2Bak,B+ 2B-1ak,B-1+..+ 21ak, 1+ 20ak, 0) (2Bak,B+ 2B-1ak,B-1+..+ 21ak, 1+ 20ak, 0) 22B-2( ak,Bak,B-2 + ak,B-1ak,B-1 + ak,B-2ak,B ) + {2B-1ak,Bak,B-2 + 22B-2ak,B-12 22B-3( ak,Bak,B-3 + ak,B-1ak,B-2 + ak,B-2ak,B-1 + ak,B-3ak,B ) + { 22B-2( ak,Bak,B-3 + ak,B-1ak,B-2 ) } 22B-4(ak,Bak,B-4+ak,B-1ak,B-3+ak,B-2ak,B-2+ak,B-3ak,B-1+ak,B-4ak,B)... {22B-3( ak,Bak,B-4+ak,B-1ak,B-3)+22B-4ak,B-22} =22B ( ak,B2 + ak,Bak,B-1 ) + 22B-1( ak,Bak,B-2 ) + 22B-2( ak,B-12 + ak,Bak,B-3 + ak,B-1ak,B-2 ) + 22B-3( ak,Bak,B-4+ak,B-1ak,B-3) + 22B-4ak,B-22 ... X(X1...Xn) RKN (Rank K Nbr), K=|X|-1, yields1.a_outlier_detector (top y dissimilarity from X-{x}). ANOTHER TRY! Install in RKN, each RankK(D2NN(x)) (1-time construct but for. e.g., 1 trillion xs? |X|=N=1T, slow. Parallelization?) xX, the square distance from x to its neighbors (near and far) is the column of number (vTree or SPTS) d2(x,X)= (x-X)o(x-X)= k=1..n|xk-Xk|2= k=1..n(xk-Xk)(xk-Xk)= k=1..n(xk2-2xkXk+Xk2) Should we pre-compute all pk,i*pk,j p'k,i*p'k,j pk,i*p'k,j D2NN=multi-op pTree adds? When xk,b=1, ak,b=p'k,b and when xk,b=0, ak,b= -pk.b So D2NN just multi-op pTree mults/adds/subtrs? Each D2NN row (each xX) is separate calc. = -2 kxkXk + kxk2 + kXk2 3. Pick this from XoX for each x and add to 2. = -2xoX + xox + XoX 5. Add 3 to this k=1..n i=B..0,j=B..02i+jpk,ipk,j 1. precompute pTree products within each k i,j 2i+j kpk,ipk,j 2. Calculate this sum one time (independent of the x) -2xoX cost is linear in |X|=N. xox cost is ~zero. XoX is 1-time -amortized over xX (i.e., =1/N) or precomputed The addition cost, -2xoX + xox + XoX, is linear in |X|=N So, overall, the cost is linear in |X|=n. Data parallelization? No! (Need all of X at each site.) Code parallelization? Yes! (After replicating X to all sites, Each site creates/saves D2NN for its partition of X, then sends requested number(s) (e.g., RKN(x) ) back.

  32. LSR on IRIS150-3 Here we use the diagonals. d=e1 p=AVGs, L=(X-p)od 43 58 S 49 70 E 49 79I d=e4 p=AvgS, L=(X-p)od -2 4 S&L 7 16 E&L 11 23I&L d=e4 p=AvgS, L=(X-p)od -2 4 S&L 7 16 E&L 11 23I&L R(p,d,X) SEI 0 128 270 393 1558 3444 [43,49) S(16) 0 128 [49,58) E(24)I(6) 0 S(34) 99 393 1096 1217 1825 [70,79] I(12) 2081 3444 [58,70) E(26) I(32) 270 792 1558 2567 30ambigs, 5 errs -2,4) 50 -2,4) 50 [7,11) 28 [7,11) 28 [11,16) 22, 16 127.5 648.7 1554.7 2892 [11,16) 22, 16 5.7 36.2 151.06 611 [16,23] I=34 [16,23] I=34 E(50) I(7) 49 49 (36,7) 63 70 (11) d=e1 p=AS L=(X-p)od (-pod=-50.06) -7.06 7.94 S&L -1;06 19.94 E&L -1.06 28.94 I&L d=e1 p=AS L=(X-p)od (-pod=-50.06) -7.06 7.94 S&L -1;06 19.94 E&L -1.06 28.94 I&L d=e1 p=AS L=(X-p)od (-pod=-50.06) -7.06 7.94 S&L -1;06 19.94 E&L -1.06 28.94 I&L -8,-2 16 [-2,8) 34, 24, 6 0 99 393 1096 1217 1825 [8,20) 26, 32 270 792 1558 2567 [20,29] 12 E=22 I=7 p=AvgS E=22 I=8 p=AvgI E=17 I=7 p=AvgE E=26 I=5 p=AvgS -8,-2 16 -8,-2 16 [-2,8) 34, 24, 6 0 99 393 1096 1217 1825 [-2,8) 34, 24, 6 0 99 393 1096 1217 1825 [8,20) w p=AvgI 26, 32 0.62 34.9 387.8 1369 [8,20) w p=AvgE 26, 32 1.9 51.8 78.6 633 [20,29] 12 [20,29] 12 Only overlap L=[58,70), R[792,1557] (E(26),I(5)) With just d=e1, we get good hulls using LARC: While  Ip,d containing >1class, for next (d,p) create L(p,d)Xod-pod, R(p,d)XoX+pop-2Xop-L2 1.  MnCls(L), MxCls(L), create a linear boundary. 2.  MnCls(R), MxCls(R).create a radial boundary. 3. Use R&Ck to create intra-Ck radial boundaries Hk = {I | Lp,d includes Ck} <--E=6 I=4 p=AvgE <--E=25 I=10 p=AvgI d=e4 p=AvgS, L=(X-p)od -2 4 S&L 7 16 E&L 11 23I&L [16,23] I=34 -2,4) 50 [7,11) 28 [11,16) 22, 16 127.5 1555 2892 Here we try using other p points for the R step (other than the one used for the L step). d=e1 p=AvgS, L=Xod 43 58 S&L 49 70 E&L 49 79 I&L R & L I(1) I(42) For e4, the best choice of p for the R step is also p=AvgE. (There are mistakes in this column on the previous slide!) There is a best choice of p for the R step (p=AvgE) but how would we decide that ahead of time?

  33. SRR(AVGs,dse) on C1,1 0 154 S y isa O if yoD(-,43)(79,) d=e1=1000; The xod limits: 43 58 S 49 70 E 49 79 I y isa O or S( 9) if yoD[43,47] y isa O if yoD[43,47]&SRR(-,52)(60,) y isa O or S(41) or E(26) or I( 7) if yoD(47,60) (yC1,2) y isa O or E(24) or I(32) if yoD[60,72] (yC1,3) y isa O or I(11) if yoD(72,79] y isa O if yoD[72,79]&SRR(-,49)(78,) y isa O if y isa C3,1 AND SRR(AVGs,Dei)[0,2)(370,) y isa O or E(4) if y isa C3,1 AND SRR(AVGs,Dei)[2,8) y isa O or E(27) or I(2) if y isa C3,1 AND SRR(AVGs,Dei)[8,106) y isa O or E(9) if y isa C3,1 AND SRR(AVGs,Dei)[106,370] d=e2=0100 on C1,3 xod lims: 22 34 E 22 34 I zero differentiation! y isa O or E(17) if yoD[60,72]&SRR[1.2,20] y isa O if yoD (-,-2) (19,) y isa O or E( 7) or I( 7)if yoD[60,72]&SRR[20, 66] y isa O or I(8) if yoD  [ -2 , 1.4] y isa O or I(25)if yoD[60,72]&SRR[66,799] y isa O or E(40) or I(2) if yoD  C3,1 [ 1.4 ,19] y isa O if yoD[0,1.2)(799,) d=e2=0100 on C1,2 xod lims: 30 44 S 20 32 E 25 30 I y isa O if yoD(-,18)(46,) y isa O or E( 3) if yoD[18,23) d=e3=0010 on C2,2 xod lims: 30 33 S 28 32 E 28 30 I y isa O if yoD[18,23)&SRR[0,21) y isa O if yoD(-,1)(5,12)(24,) y isa O or E( 1) or I( 3) if yoD[16,24) d=e3=0001 xod lims: 12 18 E 18 24 I y isa O or E(13) or I( 4) if yoD[23,28) (yC2,1) y isa O if yoD[16,24)&SRR[0,1198)(1199,1254)1424,) y isa O or S(13) if yoD[1,5] y isa O or S(13) or E(10) or I( 3) if yoD[28,34) (yC2,2) y isa O if yoD(-,28)(33,) y isa O or E( 9) if yoD[12,16) y isa O or E(1) if yoD[16,24)&SRR[1198,1199] y isa O or S(28) if yoD[34,46] y isa O or S(13) or E(10) or I(3) if yoD[28,33] y isa O if yoD[12,16)&SRR[0,208)(558,) y isa O or I(3) if yoD[16,24)&SRR[1254,1424] y isa O if yoD[34,46]&SRR[0,32][46,) LSR on IRIS150 y isa O if yoD (-,-184)(123,381)(2046,) y isa O if y isa C1,1 AND SRR(AVGs,Dse)(154,) y isa O or S(50) if y isa C1,1 AND SRR(AVGs,DSE)[0,154] y isa O or S(50) if yoD  C1,1 [-184 , 123] y isa O or I(1) if yoD  C1,2  [ 381 , 590] Dse 9 -6 27 10; xoDes: -184 123 S 590 1331 E 381 2046 I y isa O or E(50) or I(11) if yoD  C1,3  [ 590 ,1331] y isa O or I(38) if yoD  C1,4  [1331 ,2046] SRR(AVGs,dse) on C1,2only one such I y isa O if y isa C1,3 AND SRR(AVGs,Dse)(-,2)U(143,) y isa O or E(10) if y isa C1,3 AND SRR in [2,7) y isa O or E(40) or I(10) if y isa C1,3 AND SRR in [7,137) = C2,1 y isa O or I(1) if y isa C1,3 AND SRR in [137,143] etc. SRR(AVGs,dse) onC1,3 2 137 E 7 143 I Dei 1 .7 -7 -4; xoDei on C2,1: 1.4 19 E -2 3 I SRR(AVGe,dei) onC3,1 2 370 E 8 106 I We use the Radial steps to remove false positives from gaps and ends. We are effectively projecting onto a 2-dim range, generated by the Dline and the Dline (which measures the perpendicular radial reach from the D-line). In the D projections, we can attempt to cluster directions into "similar" clusters in some way and limit the domain of our projections to one of these clusters at a time, accommodating "oval" shaped or elongated clusters giving a better hull fit. E.g., in the Enron email case the dimensions would be words that have about the same count, reducing false positives. LSR on IRIS150-2 We use the diagonals. Also we set a MinGapThres=2 which will mean we stay 2 units away from any cut

  34. LSR IRIS150. d=AvgEAvgI p=AvgE, L=(X-p)od -36 -25 S -14 11 E -17 33I d=AvgSAvgE p=AvgS, L=(X-p)od -6 4 S 18 42 E 11 64I d=AvgSAvgI p=AvgS, L=(X-p)od -6 5 S 17.5 42 E 12 65I [-14,11) (50, 13) 0 2.8 76 134 [11,33] I(36) [-17,-14)] I(1) [17.5,42) (50,12) 4.7 6 192 205 [18,42) (50,11) 2 6.92 133 137 [11,33] I(37) [42,64] 38 [12,17.5)] I(1) [11,18)] I(1) R(p,d,X) S E I 0 2 6 137 154 393 R(p,d,X) S E I .3 .9 4.7 150 204 213 R(p,d,X) S E I 0 2 32 76 357 514 38ambigs 16errs 30ambigs, 5 errs d=e3 p=AvgS, L=(X-p)od -5 5 S&L 15 37 E&L 4 55I&L d=e2 p=AvgS, L=(X-p)od -11 10 S&L -14 0 E&L -13 4I&L d=e4 p=AvgE, L=(X-p)od -13 -7 S&L -3 5 E&L 1 12I&L d=e4 p=AvgS, L=(X-p)od -2 4 S&L 7 16 E&L 11 23I&L d=e1 p=AS L=(X-p)od (-pod=-50.06) -7.06 7.94 S&L -1;06 19.94 E&L -1.06 28.94 I&L -5,4) 47 [4,15) 3 1 [15,37) 50, 15 157 297 536 792 [37,55] I=34 ,-13) 1 -13,-11 0, 2, 1 all=-11 -11,0 29,47,46 0 66 310 352 1749 4104 [0,4) [4, 15 3 6 -2,4) 50 -7] 50 [-3,1) 21 [7,11) 28 [1,5) 22, 16 .7 .7 4.8 4.8 [11,16) 22, 16 11 16 11 16 [16,23] I=34 [5,12] 34 -8,-2 16 [-2,8) 34, 24, 6 0 99 393 1096 1217 1825 [8,20) 26, 32 270 792 1558 2567 [20,29] 12 3, 1 E=32 I=14 E=22 I=16 E=22 I=16 E=32 I=14 E=18 I=12 E=26 I=5 9, 3 1, 1 2, 1 46,11 d=e3 p=AvgE, L=(X-p)od -32 -24 S&L -12 9 E&L -25 27I&L d=e1 p=AE L=(X-p)od (-pod=-59.36) -17 -1 S&L -11 11 E&L -11 20I&L d=e2 p=AvgE, L=(X-p)od -5 `17 S&L -8 7 E&L -6 11I&L ,-25) 48 -25,-12 2 11 -17-11 16 [-11,-1) 33, 21, 3 0 27 107 172 748 1150 [-12,9) 49, 15 2(17) 16 158 199 [9,27] I=34 [-1,11) 26, 32 1 51 79 633 [11,20] I12 ,-6) 1 [-6, -5) 0, 2, 1 15 18 58 59 [-5,7) 29,47, 46 3 58 234 793 1103 1417 [7,11) [11, 15 3 6 1 err E=5 I=3 E=47 I=22 E=22 I=16 E=46 I=14 E=7 I=4 E=39 I=11 E=47 I=12 21, 3 13, 21 E=26 I=11 E=45 I=12 d=e4 p=AvgI, L=(X-p)od -19 -14 S&L -10 -3 E&L -6 5I&L d=e3 p=AvgI, L=(X-p)od -44 -36 S&L -25 -4 E&L -37 14I&L d=e2 p=AvgI, L=(X-p)od -7 `15 S&L -10 4 E&L -8 9I&L d=e1 p=AI L=(X-p)od (-pod=-65.88) -22 -8 S&L -17 4 E&L -17 14I&L ,-25) 48 -25,-12 2 1 1 [-17,-8) 33, 21, 3 38 126 132 730 1622 2181 [-6,-3) 22, 16 same range [5,12] 34 [-25,-4) 50, 15 5 11 318 453 [9,27] I=34 [-8,4) 26, 32 0 34 1368 730 [-8, -7) 2, 1 allsame [5, 9] 9, 2, 1 allsame ,-6) 1 [-7, 4) 29,46,46 5 36 929 1403 1893 2823 [6,11) [11, 15 3 6 S=9 E=2 I=1 E=2 I=1 E=2 I=1 d=e1 p=AvgS, L=Xod 43 58 S&L 49 70 E&L 49 79 I&L Note that each L=(X-p)od is just a shift of Xod by -pod (for a given d). Next, we examine: For a fixed d, the SPTS, Lp,d. is just a shift of LdLorigin,d by -pod we get the same intervals to apply R to, independent of p (shifted by -pod). Thus, we calculate once, lld=minXod hld=maxXod, then for each different p we shift these interval limit numbers by -pod since these numbers are really all we need for our hulls (Rather than going thru the SPTS calculation of (X-p)od anew  new p). There is no reason we have to use the same p on each of those intervals either. So on the next slide, we consider all 3 functionals, L, S and R. E.g., Why not apply S first to limit the spherical reach (eliminate FPs). S is calc'ed anyway?

  35. Form Class Hulls using linear d boundaries thru min and max of Lk.d,p=(Ck&(X-p))od  On every Ik,p,d{[epi,epi+1) | epj=minLk,p,d or maxLk,p,d for some k,p,d} interval add spherical and barrel boundaries with Sk,p and Rk,p,d similarly (use enough (p,d) pairs so that no 2 class hulls overlap) Points outside all hulls are declared as "other". all p,ddis(y,Ik,p,d) = unfitness of y being classed in k. Fitnessof y in k is f(y,k) = 1/(1-uf(y,k)) On IRIS150 d, precompute! XoX, Ld=Xod nk,L,d Lmin(Ck&Ld) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Ld 51 49 47 46 50 54 46 50 44 49 54 48 48 43 58 57 54 51 57 51 54 51 46 51 48 50 50 52 52 47 48 54 52 55 49 50 55 49 44 51 50 45 44 50 51 48 51 46 53 50 70 64 69 55 65 57 63 49 66 52 50 59 60 61 56 67 56 58 62 56 59 61 63 61 64 XoX 4026 3501 3406 3306 3996 4742 3477 3885 2977 3588 4514 3720 3401 2871 5112 5426 4622 4031 4991 4279 4365 4211 3516 4004 3825 3660 3928 4158 4060 3493 3525 4313 4611 4989 3588 3672 4423 3588 3009 3986 3903 2732 3133 4017 4422 3409 4305 3340 4407 3789 8329 7370 8348 5323 7350 6227 7523 4166 7482 5150 4225 6370 5784 6967 5442 7582 6286 5874 6578 5403 7133 6274 7220 6858 6955 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 146 148 149 150 7366 7908 8178 6691 5250 5166 5070 5758 7186 6066 7037 7884 6603 5886 5419 5781 6933 5784 4218 5798 6057 6023 6703 4247 5883 9283 7055 9863 8270 8973 11473 5340 10463 8802 10826 8250 7995 8990 6774 7325 8458 8474 12346 11895 6809 9563 6721 11602 7423 9268 10132 7256 7346 8457 9704 10342 12181 8500 7579 7729 11079 8837 8406 5148 9079 9162 8852 7055 9658 9452 8622 7455 8229 8445 7306 xk,L,d max(Ck&Ld) d=1000 66 68 67 60 57 55 55 58 60 54 60 67 63 56 55 55 61 58 50 56 57 57 62 51 57 63 58 71 63 65 76 49 73 67 72 65 64 68 57 58 64 65 77 77 60 69 56 77 63 67 72 62 61 64 72 74 79 64 63 61 77 63 64 60 69 67 69 58 68 67 67 63 65 62 59 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 146 148 149 150 Lp,d =Ld-pod d=e1 d=e2 d=e3 d=e4 d=1000 p=0000 nk,L,d xk,L,d S 43 58 E 49 70 I 49 79 d=1000 p=AS=(50 34 15 2) d=1000 p=AE=(59 28 43 13) d=1000 p=AI=(66 30 55 20) -7 8 -1 20 -1 29 -16 -1 -10 11 -10 20 -23 -8 -17 4 -17 13 d=0100 p=0000 nk,L,d xk,L,d S 23 44 E 20 34 I 22 38 d=0100 p=AS=(50 34 15 2) d=0100 p=AE=(59 28 43 13) d=0100 p=AI=(66 30 55 20) d=0010 p=0000 nk,L,d xk,L,d -5 16 -8 6 -6 10 -7 14 -10 4 -8 8 -11 10 -14 0 -12 4 S 10 19 E 30 51 I 18 69 d=0010 p=AS=(50 34 15 2) d=0010 p=AE=(59 28 43 13) d=0010 p=AI=(66 30 55 20) d=0001 p=0000 nk,L,d xk,L,d -33 -24 -13 8 -25 26 -45 -36 -25 -4 -37 14 -5 4 15 36 3 54 S 1 6 E 10 18 I 14 25 d=0001 p=AS=(50 34 15 2) d=0001 p=AE=(59 28 43 13) d=0001 p=AI=(66 30 55 20) -1 4 8 16 12 23 -12 -7 -3 5 1 12 -25 -20 -16 -8 -12 -1 FAUST Oblique, LSR Linear, Spherical, Radial classifier p,(pre-ccompute?) Ld,p(X-p)od=Ld-pod nk,L,d,pmin(Ck&Ld,p)=nk,L,d-pod xk,L,d.pmax(Ck&Ld,p)=xk,L,d-pod p=AvgS p=AvgE p=AvgI We have introduce 36 linear bookends to the class hulls, 1 pair for each of 4 ds, 3 ps , 3 class. For fixed d, Ck, the pTree mask is the same over the 3 p's. However we need to differentiate anyway to calculate R correctly. That is, for each d-line we get the same set of intervals for every p (just shifted by -pod). The only reason we need to have them all is to accurately compute R on each min-max interval. In fact, we computer R on all intervals (even those where a single class has been isolated) to eliminate False Positives (if FPs are possible - sometimes they are not, e.g., if we are to classify IRIS samples known to be Setosa, vErsicolor or vIriginica, then there is no "other"). Assuming Ld, nk,L,d and xk,L,d have been pre-computed and stored, the cut-pt pairs of (nk,L,d,p; xk,L,d,p) are computed without further pTree processing, by the scalar computations: nk,L,d,p = nk,L,d-pod xk,L,d.p = xk,L,d-pod.

  36. Analyze R:RnR1 (and S:RnR1?) projections on each interval formed by consecutive L:RnR1 cut-pts. LSR IRIS150 e1 only Sp  (X-p)o(X-p) = XoX + L-2p + pop nk,S,p = min(Ck&Sp) xk,S,p  max(Ck&Sp) Rp,d Sp-L2p,d = L-2p-(2pod)d + pop + pod2 + XoX - L2dnk,R,p,d = min(Ck&Rp,d) xk,R,p,d  max(Ck&Rp,d) 34 246 24 126 2 1 132 730 1622 2281 26 32 0 342610 388 1369 34 246 0 279 5 171 186 748 998 26 32 1 517,4 79 633 16 1641 2391 12 17 220 16 723 1258 12 249 794 16 0 128 34 0 99 393 1096 1217 1826 24 6 12 2081 3445 26 32 270 792 26 5 1558 2568 d=1000 p=AS=(50 34 15 2) d=1000 p=AE=(59 28 43 13) d=1000 p=AI=(66 30 55 20) with AI 17 220 with AE 1 517,4 78 633 -7 8 -1 20 -1 29 -16 -1 -10 11 -10 20 -23 -8 -17 4 -17 13 What is the cost for these additional cuts (at new p-values in an L-interval)? It looks like: make the one additional calculation: L-2p-(2pod)d then AND the interval masks, then AND the class masks? (Or if we already have all interval-class mask, only one mask AND step.) eliminates FPs better? Recursion works wonderfully on IRIS: The only hull overlaps after only d=1000 are And the 4 i's common to both are {i24 i27 i28 i34}. We could call those "errors". 7 4 36 540,4 72 170 If on the L 1000,avgE interval, [-1, 11) we recurse using SavgI we get Ld d=1000 p=origin Setosa 43 58 vErsicolor 49 70 vIrginica 49 79 If we have computed, S:RnR1, how can we utilize it?. We can, of course simply put spherical hulls boundaries by centering on the class Avgs, e.g., Sp p=AvgS Setosa 0 154 E=50 I=11 vErsicolor 394 1767 vIrginica 369 4171 Thus, for IRIS at least, with only d=e1=(1000), with only the 3 ps avgS, avgE, avgI, using full linear rounds, 1 R round on each resulting interval and 1 S, the hulls end up completely disjoint. That's pretty good news! There is a lot of interesting and potentially productive (career building) engineering to do here. What is precisely the best way to intermingle p, d, L, R, S? (minimizing time and False Positives)?

  37. FAUST LSR Hull classifier, recursive LR on MG44d60w We won't carve off outliers since they're all sort of outliers! We first hull the classes. To do the hulling we choose the keyword from each class for our d that occurs in the maximum number of class docs and use the Avg for p. Then we will look at the test doc classifications and assess how good they are (basically to see if the method reveals affinities that make sense and that I hadn't notice when I put together the "expert training set" in the first place - therefore answering the question "Has new info been uncovered thru SML?") First attempt: Try to isolate one class at a time. Take the column(word), ek, with maximum class count and use Lek C1: k=3 (baby) isolates C1 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (4/4 so 0% FN). 0 test cases are in class1. C2: k=34 (King) isolates C2 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (3/3 so 0% FN). 0 test cases are in class2 C3: k=47 (plum) isolates C3 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (3/8 so 62.5% FN). 0 test cases are in class3 C4: k=47 (plum) isolates C4 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (2/4 so 50% FN). 2 test cases are in class4 test 17FEC Here sits the Lord Mayor. Here sit his two men... test 30HDD Hey diddle diddle! The cat and the fiddle. The cow jumped... C5: k=8 (bed) isolates C5 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (2/3 so 33.3% FN). 1 test cases are in class5 test 03DDD Diddle diddle dumpling my son John. Went to bed with... C6: k=528 (plum) isolates C6 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (3/5 so 40% FN). 0 test cases are in class6 FPs 41OKC Old King Cole... 01TBM Three Blind Mice... C7: k=29 (girl) isolates C7 from the other classes and 0 test cases are in that hull, so 100% TP accuracy (1/3 so 66.7% FN). 1 test cases are in class7 test 49WLG There was a little girl who had a little curl right in the...

  38. Av: 0.05 0.11 0.09 0.07 0.05 0.05 0.07 0.07 0.07 0.07 0.05 0.05 0.09 0.05 0.05 0.05 0.07 0.07 0.05 0.09 0.05 .09 .05 .07 .11 .07 .07 .05 .05 .05 0.05 always away baby back bad bag bake bed boy bread bright brown buy cake child clean cloth cock crown cry cut day dish dog eat fall fiddle full girl green high .05 0.05 .07 . 05 .05 .05 .14 .05 .05 .05 0.11 .05 .14 .07 .05 .07 .07 .09 .05 .07 .11 0.05 07 .05 .05 .07 .05 .05 .05 hill house king lady lamb maid men merry moneymorn mother nose old pie pig plum round run sing son three thumb town tree two way wife woman wool Clustering MG44d60w C11 .42 .31 1.3 .14 .68 .99 2.3 .26 .68 1.01 2.3 15 2 2 1 C111 DO2DO3SO3 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 ffa0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 C111 .36 1.8 .33 .80 2.6 .44 .85 2.6 11 3 1 C1111 TO1 SO4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 ffa0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 C1111 .3 1.3 .32 .83 2.2 .43 .86 2.6 7 3 1 C11111 TO2 SO5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ffa0 01 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 C11111 .5 2.5 .36 .99 2.4 .47 .99 2.6 5 1 1 C111111 SO6 SO7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ffa1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 C1 .28 2.2 .3 .72 3.06 .44 .82 3.06 20 6 1 C11 C12SO2 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 ffa0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 .18 .14 3 <-Gaps Always Avg to ffa 0 .28 .52 3.6 mn .1 .39 .55 3.6 mx 27 14 2 1 Ct C1 C2 DO1SO1 44 Mother Goose Rythmes Vocab(60 content terms) 1 0 0 0 01TBM Three blind mice! See how they run! They all 1 0 0 0 02TLP This little pig went to market. This little pi 1 0 0 0 03DDD Diddle diddle dumpling my son John. Went to be 1 0 0 0 04LMM Little Miss Muffet sat on a tuffet, eating of 0 1 0 0 05HDS Humpty Dumpty sat on a wall. Humpty Dumpty ha 1 0 0 0 06SPP See a pin and pick it up. All the day you will 0 0 1 0 07OMH Old Mother Hubbard went to the cupboard to giv 1 0 0 0 08JSC Jack Sprat could eat no fat. His wife could ea 1 0 0 0 09HBD Hush baby. Daddy is near. Mamma is a lady and 1 0 0 0 10JAJ Jack and Jill went up the hill to fetch a pail 0 1 0 0 11OMM One misty moisty morning when cloudy was the w 1 0 0 0 12OWF There came an old woman from France who taught 1 0 0 0 13RRS A robin and a robins son once went to town to 1 0 0 0 14ASO If all the seas were one sea, what a great sea 0 1 0 0 15PCD Great A. little a. This is pancake day. Toss t 1 0 0 0 16PPG Flour of England, fruit of Spain, met together 1 0 0 0 17FEC Here sits the Lord Mayor. Here sit his two me 1 0 0 0 18HTP I had two pigeons bright and gay. They flew fr 0 1 0 0 21LAU The Lion and the Unicorn were fighting for the 0 1 0 0 22HLH I had a little husband no bigger than my thumb 1 0 0 0 23MTB How many miles is it to Babylon? Three score m 1 0 0 0 25WOW There was an old woman, and what do you think? 1 0 0 0 26SBS Sleep baby sleep. Our cottage valley is deep. 1 0 0 0 27CBC Cry baby cry. Put your finger in your eye and 0 1 0 0 28BBB Baa baa black sheep, have you any wool? Yes si 1 0 0 0 29LFW When little Fred went to bed, he always said h 0 0 1 0 30HDD Hey diddle diddle! The cat and the fiddle. The 1 0 0 0 32JGF Jack, come give me your fiddle, if ever you me 1 0 0 0 33BFP Buttons, a farthing a pair! Come, who will buy 0 0 0 1 35SSSffa Sing a song of sixpence a pocket full of rye 0 1 0 0 36LTT Little Tommy Tittlemouse lived in a little hou 0 1 0 0 37MBB Here we go round the mulberry bush, the mulber 0 1 0 0 38YLS If I had as much money as I could tell, I neve 0 1 0 0 39LCS A little cock sparrow sat on a green tree. And 0 1 0 0 41OKC Old King Cole was a merry old soul. And a merr 0 1 0 0 42BBC Bat bat, come under my hat and I will give you 1 0 0 0 43HHD Hark hark, the dogs do bark! Beggars are comin 1 0 0 0 44HLH The hart he loves the high wood. The hare she 1 0 0 0 45BBB Bye baby bunting. Father has gone hunting. Mot 1 0 0 0 46TTP Tom Tom the pipers son, stole a pig and away h 1 0 0 0 47CCM Cocks crow in the morn to tell us to rise and 0 1 0 0 48OTB One two, buckle my shoe. Three, four, knock at 1 0 0 0 49WLG There was a little girl who had a little curl 0 1 0 0 50LJH Little Jack Horner sat in the corner, eating o

  39. Thanksgiving Clustering MG44d60w using min to max L Gap 0.13 0 0.13 0.12 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0 0.26 0.12 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0 0.39 0.12 0.52 0 0.52 0 0.52 0 0.52 0 0.52 0 0.52 0 0.52 0.12 0.65 0 0.65 0 0.65 0 0.65 0 0.65 0.12 0.77 0 0.77 0.12 0.90 0 0.90 0 0.90 0 0.90 0.77 1.68 03DDD Diddle diddle dumpling my son John. Went to be 04LMM Little Miss Muffet sat on a tuffet, eating of 02TLP This little pig went to market. 06SPP See a pin and pick it up. All the day you 08JSC Jack Sprat could eat no fat. His wife could ea 18HTP I had two pigeons bright and gay. They flew fr 22HLH I had a little husband no bigger than my thumb 23MTB How many miles is it to Babylon? Three score m 25WOW There was an old woman, and what do you think? 36LTT Little Tommy Tittlemouse lived in a little hou 42BBC Bat bat, come under my hat and I will give you 43HHD Hark hark, the dogs do bark! Beggars are comin 49WLG There was a little girl who had a little curl 05HDS Humpty Dumpty sat on a wall. Humpty Dumpty ha 09HBD Hush baby. Daddy is near. Mamma is a lady and 11OMM One misty moisty morning when cloudy was the w 12OWF There came an old woman from France who taught 15PCD Great A. little a. This is pancake day. Toss t 27CBC Cry baby cry. Put your finger in your eye and 32JGF Jack, come give me your fiddle, if ever you me 33BFP Buttons, a farthing a pair! Come, who will buy 38YLS If I had as much money as I could tell, I neve 45BBB Bye baby bunting. Father has gone hunting. Mot 48OTB One two, buckle my shoe. Three, four, knock at 13RRS A robin and a robins son once went to town to 14ASO If all the seas were one sea, what a great sea 29LFW When little Fred went to bed, he always said h 37MBB Here we go round the mulberry bush, the mulber 44HLH The hart he loves the high wood. The hare she 47CCM Cocks crow in the morn to tell us to rise and 50LJH Little Jack Horner sat in the corner, eating o 01TBM Three blind mice! See how they run! They all 10JAJ Jack and Jill went up the hill to fetch a pail 17FEC Here sits the Lord Mayor. Here sit his two me 41OKC Old King Cole was a merry old soul. And a merr 46TTP Tom Tom the pipers son, stole a pig and away h 21LAU The Lion and the Unicorn were fighting for the 28BBB Baa baa black sheep, have you any wool? Yes si 07OMH Old Mother Hubbard went to the cupboard to giv 26SBS Sleep baby sleep. Our cottage valley is deep. 30HDD Hey diddle diddle! The cat and the fiddle. The 39LCS A little cock sparrow sat on a green tree. And 35SSS Sing a song of sixpence a pocket full of rye.

  40. 2.62 AvgDNN 2.44 MedDNN Sp Ct Gp 2 1 1 3 1 1 4 3 1 5 7 1 6 4 1 7 3 1 8 2 1 9 4 1 10 3 1 11 4 1 12 4 1 13 2 1 14 1 1 15 2 1 16 3 1 17 2 1 18 2 6 24 3 1 25 1 Le4 PCC=90% L Ct Gp 1 6 1 2 28 1 3 6 1 4 7 1 5 1 1 6 1 4 10 7 1 11 3 1 12 5 1 13 13 1 14 7 1 15 12 1 16 4 1 17 1 1 18 10 1 19 5 1 20 6 1 21 6 1 22 3 1 23 7 1 24 3 1 25 2 49s 49e 3i where p=Av(¾,1)= 64.4 30.8 50.2 16.2 1e 41i Thanksgiving clustering (carve off clusters as one would carve a thanksgiving turkey) Let m be a furthest point from aAvgX (i.e., pt in X that maximizes SPTS, Sa=(X-a)o(X-a) ) DNNS (top) 16.0 i39 7.34 i7 6.32 i10 6.24 s42 5.56 i9 5.38 i36 5.38 i35 4.89 i15 4.89 e13 4.58 s23 4.35 i20 4.24 e15 4.24 i1 4.12 i32 4.12 i19 4.12 i18 If m is an outlier, carve {m} off from X. Repeat until m is a non-outlier. Construct L=Ld where d is next in dSet Carve off L-gapped cluster(s). Pick centroid, cc=mean of slice, SL: A. If (PCC2=PCD1) declare L-1[PCC1,PCC2] to be a cluster and carve it off (mask it off) of X; else (PCC2=PCI2 ) SLL-1[(3PCC1+PCC2)/4 ,PCC2) and look for a Scc or Rd,cc gap. If one is found, declare it to be a cluster and carve it off of X; Else add cc to the Cluster Centroid Set, CCS. B. Do A. from the high side of L also. Repeat until no new clusters carve off. If X (not completely carved up) use pkmeans on the remains with initial centroids, CCS IRIS: Carve off outliers with DNN>=5: i39 i7 i10 s42 i9 i36 i35. GT=4 Discovered there is a Versicolor plume led by {e8 e11 e44 e49} (I checked S centered at their average and found they are gapped away from all other by 3 and a scatter of other versicolor connect them to versicolor central). Sp1 Ct Gp 1 1 2 3 2 3 6 1 1 7 2 1 8 2 1 9 3 2 11 2 1 12 6 1 13 2 1 14 1 1 15 3 1 16 1 1 17 4 1 18 3 1 19 1 1 20 2 1 21 7 1 22 2 1 23 1 1 24 2 2 26 3 8 34 1 e8 49 24 33 10 e11 50 20 35 10 e44 50 23 33 10 e49 51 25 30 11 emn 49 20 30 10 emx 70 34 51 18 eav 59.36 27.7 42.6 13.26 taking Sp1=S(Av(last 4)= 50 23 32.75 10.25 the 4 are gapped away from the other 47 by 3 i20 60 22 50 15 i30 72 30 58 16 i34 63 28 51 15 e21 59 32 48 18

  41. 2.62 AvgDNN 2.44 MedDNN 49s Le4 PCC=75% Le1 PCC=75% PCI1 C1 49e 3i L Ct Gp 43 1 1 44 3 2 46 4 1 47 2 1 48 5 1 49 5 1 50 10 1 51 9 1 52 4 1 53 1 1 54 6 1 55 7 1 56 6 1 57 8 1 58 7 1 59 3 1 60 5 1 61 5 1 62 4 1 63 9 1 64 7 1 65 5 1 66 2 1 67 7 1 68 3 1 69 4 1 70 1 1 71 1 1 72 2 1 73 1 1 74 1 2 76 1 1 77 3 2 79 1 L Ct Gp 1 6 1 2 28 1 3 6 1 4 7 1 5 1 1 6 1 4 10 7 1 11 3 1 12 5 1 13 13 1 14 7 1 15 12 1 16 4 1 17 1 1 18 10 1 19 5 1 20 6 1 21 6 1 22 3 1 23 7 1 24 3 1 25 2 PCD1 i20 60 22 50 15 i34 63 28 51 15 i30 72 30 58 16 PCD2 PCI2 1e 41i e21 59 32 48 18 PCD3 Le3 on C1 30 1 3 33 2 2 35 2 1 36 1 1 37 1 1 38 1 1 39 3 1 40 5 1 41 3 1 42 4 1 43 2 1 44 4 1 45 7 1 46 3 1 47 5 1 48 1 1 49 2 1 50 2 1 51 2 7 58 1 i30 72 30 58 16 Thanksgiving clustering-2 Let m be a furthest point from aAvgX (i.e., pt in X that maximizes SPTS, Sa=(X-a)o(X-a) ) If m is an outlier, carve {m} off from X. Repeat until m is a non-outlier. DNNS (top) 16.0 i39 7.34 i7 6.32 i10 6.24 s42 5.56 i9 5.38 i36 5.38 i35 4.89 i15 4.89 e13 4.58 s23 4.35 i20 4.24 e15 4.24 i1 4.12 i32 4.12 i19 4.12 i18 Construct L=Ld where d is next in dSet If a PCI is followed by another PCI, skip the second one. Same for a PCD. Therefore PCI1 will be followed by a PCD. A. declare each L-1[PCC,PCD] to be a cluster and carve it off (mask it off) of X; B. Do A. from the high side of L also. Repeat until no new clusters carve off. Recurse using a different d on each slice where a PCC was skipped. If X (not completely carved up) use pkmeans starting with a pillar set. IRIS: Carve off outliers with DNN>=5: i39 i7 i10 s42 i9 i36 i35. GT=4 PCI1 38s 5e PCD1 C1 11s 44e 34i PCI2 PCD2

  42. FAUST LSR classifier on IRIS X=X(X1,...,Xn,C); oultiers are O=O(O1,...,On,OC)X initially empty; OT=Outlier_Threshold; Carve off outliers from X into O (O:=O{x|Rankn-1DNN(x,X)OT; O:=O{x|D2NN(X,x)>OT )... DkNN=distance to the kth Nearest Neighbor, meaning, in the distribution, UDR(dis(x,X)) it is the k+1st value, since 0 is always the first value). In the future, we'll use SQDNN...SQDkNN, for the SPTS of squares of such distances. Define class hulls: Hk{zRn | minFd,p,k  Fd,p,k(z)  maxFd,p,k (d,p)dpSet, F=L,S,R}. In the future, we'll call each pair of boundary pieces boundary plate pairs. If y is in just one hull, declare y to be in that class. Elseif y is in multiple hulls, MH, declare y the Ck that minimizes dis(y,Ck), kMH (note dis(y,Ck)=dis(y,X)&Ck). Else (y is in no hulls), if dis(y,O)min{dis(y,o)|oO}=dis(y,oi)<OT, declare y to be in the class of oi, else declare y to be other. 1. This algorithm deals with singleton outliers but ignores doubleton and tripleton outliers etc. 2. In Elseif, rather than compute dis(y,Ck) (single link distance) one could use dis(y,meanCk) for the pre-computed class means. Ld d=e1 S43 58 E 49 70 I 56 79 Ld d=e3 S 10 19 E 30 51 I 48 69 Ld d=e4 S 1 6 E 10 18 I 15 25 Ld d=e1+e2 S 51.6 71.5 E49 72.2 I 58 82.7 Ld d=e1-e2 S7 15 E 16 28.3 I 19.7 37 Ld d=e1+e3 S3853 E 57 83 I 74 104 Ld d=e1-e3 S 2033 E 6.3 16.3 I 2 12.8 Ld d=e2 S 29 44 E 20 34 I 22 38 Ld d=e1+e4 S31 43.2 E 41 59.4 I 53 71 Ld d=e1-e4 S 29.6 39.6 E 26.8 39.6 I24 42 Ld d=e2+e3 S2841.8 E 38.8 56.6 I 50.9 75 Ld d=e2-e3 S 921 E -3.6 -16.9 I-31 -13.4 Ld d=e2+e4 S 21.9 34 E21 35.4 I 26.1 43 Ld d=e2-e4 S 19 29 E 4.9 12.8 I2 12.8 Ld d=e3+e4 S8 16 E 28 47.4 I 45.9 66 Ld d=e3-e4 S5 12 E 13 24.8 I 19 34 3 5 Le1+e2-e3 e1-e2+e3+e4 e1-e2-e3+e4 Le1+e2+e3 e1-e2+e3-e4 Le1-e2-e3 e1+e2-e3+e4 e1+e2+e3-e4 e1+e2+e3+e4 Le1-e2+e3 e1-e2-e3-e4 Le1+e2+e4 Le1-e2+e4 Le1-e2-e4 Le1+e2-e4 Le2+e3+e4 Le1+e3+e4 Le1-e3+e4 Le2-e3+e4 Le1-e3-e4 Le2+e3-e4 Le1+e3-e4 Le2-e3-e4 e1+e2-e3-e4 e21 59 32 48 18 e28 67 30 50 17 e34 60 27 51 16 i20 60 22 50 15 i24 63 27 49 18 i27 62 28 48 18 i28 61 30 49 18 i34 63 28 51 15 -5 2.5 -19.5 -6 -27.5 -13.5 -3.464 4.0414 -12.12 -1.154 -17.32 -6.928 34.063 49.652 20.207 31.754 18.475 30.599 42.5 60 57.5 82 73.5 102 41.5 56 47.5 67.5 56 80.5 11.547 21.361 32.331 50.806 44.455 69.282 11 20 33.5 52 48.5 71.5 9 17.5 22.5 37 28.5 48.5 -1.5 4.5 -3 5 -3 5 48.497 66.972 60.621 86.025 76.210 105.07 30.5 45 22.5 34.5 23.5 36.5 40.991 56.002 34.641 50.806 35.795 56.002 6.9282 13.856 20.207 31.754 27.135 42.723 4.618 10.39 5.196 16.16 2.886 16.74 42.723 60.621 46.188 66.972 56.002 79.096 9.2376 19.052 -5.196 3.4641 -11.54 1.7320 30.599 40.991 40.414 59.467 49.074 71.591 23.094 31.754 25.403 37.527 31.754 47.920 31.754 44.455 53.116 77.364 72.168 97.572 24.248 36.373 37.527 56.580 50.229 73.323 15.588 25.403 -4.041 6.9282 -12.70 -1.154 6.3508 15.011 -23.09 -9.237 -38.10 -21.36 17.897 27.712 13.279 21.361 14.433 23.671 28.5 42 10 20.5 5.5 16.5 We set OT=5 and carve off outliers: i7, i9, i10, i35, i36, i39, s42 I didn't check every pair of boundary plates but many of them. It appears that the 6 samples are so tight that we do not separate them with Linear plates along! So FAUST-L-TP-accuracy = 96% Looking at the numbers, I don't expect S or R plates to help much.

  43. FAUST LSR Hull classifier, recursive L X=X(X1,...,Xn,C); oultiers are O=O(O1...On,OC)X initially ; OT=Outlier_Thres; Carve off outliers from X into O (O:=O{x|Rankn-1DNN(x,X)OT; O:=O{x|D2NN(X,x)>OT ).... 5 6 6 13 6 9 e1+e2 e1+e3 e1+e2+e3 e1+e4 79.0 79.0 80.8 80.8 75.6 79.1 77.7 80.6 53.7 55.1 54.4 57.2 61.5 70.7 57.9 64.3 1 1 2 4 1 1 3 5 6 14 d=e2 d=e3 d=e4 d=e1 14 18 15 24 59 69 56 69 10 19 30 51 48 69 25 32 22 32 We set OT=5 and carve off outliers: i7, i9, i10, i35, i36, i39, s42 Count reductions are insufficient for e1 and e3, so use e4, There is TP purity except in Le2[48,51] so restricting to that interval: ( i.e., X = L-1[48,51] ) so restricting to Le4-1[15,18]: restrict to Le1+e2-1[61,64.3]: restrict to Le1+e3-1[77.7,79.1]: restrict Le1+e4-1[55.1,54.4]: No overlap! So 100% TP accuracy already. This doesn't address FP rates, however. A first cut description of the FAUST Linear Hull - Recursive algorithm: X=X(X1,...,Xn,C); oultiers are O=O(O1,...,On,OC)X initially ; OT=Outlier_Thres; Carve off outliers from X into O ( O:=O{x|Rankn-1DNN(x,X)OT; O:=O{x|D2NN(X,x)>OT .... ) At any node in the Hull Tree, create children by: Moving to next d,dSet in each non-pure region, L-1[opti,optj], until the sum of the counts decreases by at least THRESHOLD, then restrict to that region and recurse, until purity is reach.

More Related