420 likes | 434 Views
Understand the functions for mapping training sets, clustering data, identifying outliers, and classifying data linearly or spherically. Use FAUST concepts for advanced data analysis.
E N D
FAUST Analytics X(X1..Xn) Rn, |X|=N. If X is a classified training set with classes, C={C1..CK} then X=X((X1..Xn,C}. d=(d1..dn), p=(p1..pn)Rn. Functionals, F:RnR, F=L, S, R (in terms of bit columns (compressed or not), of mappings from a PTS to a SPTS). Ld,p (X-p)od = Xod - pod And letting Ld Xod, Ld,p = Ld - pod Sp (X-p)o(X-p) = XoX + Xo(-2p) + pop = L-2p + XoX + pop Rd,p Sp - L2d,p = XoX+L-2p+pop-(Ld)2-2pod*Xod+(pod)d2 = L-2p-(2pod)d - (Ld)2+ XoX + pop+(pod)2 Fmind,p,k min(Fd,p&Ck)= minFd,p,k where Fd,p,k = Fd,p & Ck Fmaxd,p,k max(Fd,p&Ck)= maxFd,p,k FPCCd,p,k,j jth precipitous count change (left-to-right) of Fd,p,k. Same notation for PCIs and PCDs (incr/decr) GAP: GapClustererIf DensityThreshold, DT, isn't reached, cut C mid-gap of Ld,p&C using the next (d,p) from dpSet PCC: Precipitous Count Change ClustererIf DT isn't reached, cut C at PCCsLd,p&C using the next (d,p) from dpSet Fusion step may be required? Use density, proximity, or use Pillar pkMeans (next slide). TKO: Top K OutlierDetectorUse rankn-1Sx for TopKOutlier-slider. LIN: Linear Classifier yCk iff yLHk {z | minLd,p,k Ld,p,k(z) maxLd,pd,k} (d,p)dpSet LHk is a Linear hull around Ck. dpSet is a set of (d,p) pairs, e.g., (Diag,DiagStartPt). LSR: Linear Spherical Radial ClassifieryCk iff yLSRHk{z | minFd,p,k Fd,p,k(z) maxFd,p,k d,pdpSet, F=L,S,R} XoX can be pre-computed, one time. What should we pre-compute besides XoX? stats(min/avg/max/std); Xop; p=class_Avg/Med; Xod; Xox; d2(X,x); Rkid2(X,x);Ld,p, Rd,p
FAUST Clustering1 L-GapClustererCut a subcluster, C, mid-gap (of F&C) using the next (d,p) from dpSet, where F = L or S or R D=d35 0 d26 0 d1 0 d27 0 d3 0 d44 0 d16 0 d6 0 d17 0 d47 0 d18 0 d10 0 d43 0 d12 0 d33 0 d14 0 d23 0 d49 0 d25 0 d45 0 d2 0 d29 0 d13 0 d9 0 d32 0.27 d28 0.27 d41 0.27 d42 0.27 d30 0.27 d21 0.27 d22 0.27 d15 0.27 d36 0.27 d11 0.27 d38 0.27 d46 0.27 d5 0.27 d8 0.27 d37 0.27 d48 0.27 d39 0.27 d4 0.55 d50 0.55 d7 3.60 d35 {35} cluster, {7, 50} cluster D=.27s 0 d9 0 d49 0 d45 0.09 d6 0.09 d3 0.09 d33 0.09 d18 0.09 d44 0.18 d43 0.18 d25 0.18 d22 0.18 d12 0.18 d16 0.18 d2 0.27 d27 0.27 d23 0.27 d42 0.27 d15 0.27 d13 0.27 d47 0.36 d26 0.36 d29 0.36 d36 0.46 d38 0.46 d14 0.46 d48 0.46 d8 0.46 d10 0.46 d37 0.55 d32 0.55 d1 0.55 d5 0.64 d21 0.64 d4 0.64 d11 0.64 d17 0.92 d30 1.01 d41 1.01 d28 1.10 d39 1.29 d46 {28,30,39,41,46} D=.64s 0 d26 0 d33 0 d3 0 d27 0 d45 0 d2 0 d44 0 d23 0 d9 0 d15 0 d49 0 d16 0 d38 0 d6 0 d18 0 d22 0.25 d1 0.25 d37 0.25 d43 0.25 d8 0.25 d29 0.25 d25 0.25 d42 0.25 d12 0.25 d47 0.25 d48 0.51 d32 0.51 d14 0.51 d4 0.51 d36 0.51 d13 0.51 d5 0.77 d10 1.03 d11 1.29 d17 1.54 d21 0's, .25s, .51s, d10, d11, d17, d21 clusters 2^?1 0 -1 -2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.27 0 0 0 1 0.55 0 0 1 0 0.55 0 0 1 0 3.60 1 1 1 0 2^1, 2^0 bits separate out {35} Going back to D=d35, how close does HOB comes? 2^(-1) bit separates out {7,50} 2^(-2) bit separates out the .27s Next, D=all_docs, GapThresh=.08 document sub-clustering: C1 (.17 xod .25)={2,3,6,16,18,22,42,43,49} C2 (.34 xod .56)={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} D=sum of all C31docs 0.63 d17 0.63 d29 0.63 d11 0.84 d50 0.84 d13 0.84 d30 0.95 d26 0.95 d28 0.95 d10 0.95 d41 1.16 d21 C311(..63) = {11,17,29} C312(.84)={13,30,50} C313(.95)={10,26,28,41} singleton{21} C3 (.64xod.86)={10,11,13,17,21,26,28,29,30,39,41,50} Single: 46 (xod=.99); 7 (=1.16); 35 (=1.47) D=sum of allC2docs 0.27 d23 0.36 d25 0.36 d4 0.36 d38 0.45 d15 0.45 d33 0.45 d12 0.45 d36 0.54 d8 0.54 d44 0.54 d47 0.63 d1 0.63 d37 0.63 d5 0.63 d32 0.63 d50 0.72 d27 0.72 d45 0.72 d9 0.81 d14 Next, on each Ck try D=Ck, Thres=.2 D=sum of all C1docs 0.42 d16 0.42 d2 0.42 d3 0.42 d42 0.42 d43 0.42 d22 0.63 d18 0.63 d49 0.85 d6 C11(xod=.42)={2,3,16,22,42,43} doubleton{18,49}; singleton{6} D=sum of all C11docs 0.57 d2 0.57 d3 0.57 d16 0.57 d22 0.57 d42 0.57 d43 D=sum of all C3docs 0.56 d11 0.66 d17 0.66 d29 0.75 d13 0.85 d30 0.85 d10 0.94 d28 0.94 d26 0.94 d41 0.94 d50 1.03 d21 1.41 d39 C31(.56xod1.03) = {10,11,13,17,21,26,28,29,30,41,50} singleton{39} Other Clustering methods later C11: 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a little garters to tie his little hose. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. D=sum of all 44docs 0.17 d22 0.17 d49 0.21 d42 0.21 d2 0.21 d16 0.25 d18 0.25 d3 0.25 d43 0.25 d6 0.34 d23 0.34 d15 0.34 d44 0.34 d38 0.34 d25 0.34 d36 0.38 d33 0.38 d48 0.38 d8 0.43 d4 0.43 d12 0.47 d47 0.47 d9 0.47 d37 0.51 d5 0.56 d1 0.56 d32 0.56 d45 0.56 d14 0.56 d27 0.64 d10 0.64 d17 0.64 d21 0.64 d29 0.64 d11 0.69 d26 0.69 d50 0.69 d13 0.73 d30 0.77 d28 0.82 d41 0.86 d39 0.99 d46 1.16 d7 1.47 d35 C2: 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 14. If all seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into great sea, what a splish splash it would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. C311: 11. One misty moisty morning when cloudy was weather, I met an old man clothed all in leather. He began to compliment and I began to grin. How do And how do? And how do again 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. C312: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C313: 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three.
FAUST Cluster 1.2 OUTLIER: 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. WS0= 2 3 13 20 22 25 38 42 44 49 50 DS1= | WS1= 2 20 25 46 49 51 46 | DS2 46 DS0=|WS1= 7 10 17 23 25 28 33 34 37 40 43 45 50 35 |---| |DS2| |35 | OUTLIER: 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. Queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. WS0= 2 3 13 32 38 42 44 52 DS1 |WS1= 42(Mother) 7 9 |DS2|WS2=WS1 11 |7 27 |9 27 29 45 29 32 29 41 45 C1: Mother theme 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. DS0|WS1 2 9 12 18 19 21 26 27 30 32 38 39 42 44 45 47 49 52 54 55 57 60 1 |DS1| WS2 12 19 26 39 44 10 |10 | DS2| WS3 13 | | 10 | DS3 10 17 37 14 39 21 41 26 44 28 30 47 50 OUTLIER: 10. Jack and Jill went up hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. DS0| WS1=2 9 18 21 30 38 41 45 47 49 52 54 55 57 60 1 | DS1|WS2=2 9 18 30 39 45 55 13 | 39 |DS2 14 | |39 17 21 39 28 41 30 47 37 50 OUTLIER: 39. A little cock sparrow sat on a green tree. He chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie. Oh no, says the sparrow I will not make a stew. So he flapped his wings\,away he flew C3: men three 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. WS0 38 52 DS1 WS1= 38 52 1 ---------- 5 17 23 28 36 48 C4: 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not k 21. Lion and Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep.Little lamb is on green with woolly fleece so soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! WS0=2 5 8 11 14 15 16 22 24 25 29 31 36 41 44 47 48 53 54 57 59 DS1|WS1(17wds)=2 5 11 15 16 22 24 25 29 31 41 44 47 48 54 57 59 4 6 8|DS2=DS1 12 15 18 21 25 26 30 33 37 43 44 47 49 50 DS0|WS1 2 5 8 13 14 15 16 22 24 25 29 36 41 44 47 48 51 54 57 59 13 |DS2|WS2 4 13 47 51 54 14 |13 |DS3 13 21 26 30 37 47 50 OUTLIER: 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. real HOB Alternate WS0, DS0 WS0 22 38 44 52 DS1 WS1= 27 38 44 {fiddle(32 41) man(11 32) old(11 44) 11 DS2 32 11 41 22 44 C2 fiddle old man theme 11. One misty moisty morning when cloudy was weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do How do you do? How do you do again 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. OUTLIERS: 2. This little pig went to market. This little pig stayed home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. Had little husband no bigger than my thumb. Put him in a pint pot, there I bid him drum. Bought a little handkerchief to wipe his little nose, pair of little garters to tie little hose 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. DS0|WS1=6 7 8 14 43 46 48 51 53 57 2 3|DS2=DS1 16 22 42 Each of the 10 words occur in 1 doc, so all 5 docs are outliers OUTLIER:38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. Notes Using HOB, the final WordSet is the document cluster theme! When the theme is too long to be meaningful (C4) we can recurse on those (using the opposite DS)|WS0?). The other thing we can note is that DS) almost always gave us an outliers (except for C5) and only WS) almost always gave us clusters (excpt for the first one, 46). What happens if we reverse it? What happens if we just use WS0?
real HOB Alternate WS0, DS0 recuring on C3 and C4 FAUST Cluster 1.2.1 DS0|WS1=41 47 57 (on C4) 21|DS2 WS2=41(morn) 57(way) 26| 37 DS3=DS2 30| 47 . 37 47 50 C4.1 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. WS0=2 5 11 15 16 22 24 25 29 31 44 47 54 59 DS1|WS1=2 5 15 16 22 24 25 44 47 54 59 4 DS2 WS2=2 15 16 24 25 44 47 54 59 6 4 DS3 WS3=WS2 8 6 4 12 8 8 15 12 12 18 21 21 21 25 25 25 26 26 26 30 30 30 43 43 43 50 50 49 50 DS0|WS1= 47 (plum) 21 DS2 WS2=WS1 26 21 30 50 50 C4.2.1 word47(plum) 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! WS0= 2 15 16 23 24 27 36 DS1|WS1 = 2 15 16 25 44 59 4 |DS2 WS2=15 16 25 44 59 8 |4 DS3 WS3=15 16 44 59 12 |8 8 DS4 WS4=15 44 59 25 |12 12 12 DS5 WS544 59 26 |25 25 25 12 DS6=DS5 30 |26 26 26 25 C4.2.2 word44(old) word59(woman) 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. Final WordSet is too long. Recurse 4.2 OUTLIER: 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. WS0= 5 11 22 25 29 31 DS1 WS1=5 22 6 1518 49 DS2 6 C4.2.3 (day eat girl) 4. Little Miss Muffet sat on tuffet, eating curd, whey. Came big spider, sat down beside her, frightened Miss Muffet away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was the reason they did go? I can not tell, for I do not know. 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. DS0|WS1=22 25 29 4 DS2 =WS1 8 |4 8 15|15 18Recursing 18|33 49 no change 33 43 49 DS0|WS1=1 2 3 15 16 23 24 27 30 36 49 60 26 |DS1=DS0 30 Doc26 and doc30 have none of the 12 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 26. Sleep baby sleep. Cottage valley is deep.Little lamb is on green with woolly fleece soft, clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 30. Hey diddle diddle! Cat and the fiddle. Cow jumped over moon.Little dog laughed to see such sport, and dish ran away with spoon. DS0=|WS1=21 38 49 52 1 |DS1 |WS2=21 38 49 14 |1 |DS3=DS2 17 |14 28 |17 C31 [21]cut [38]men [49]run 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! C32: [38]men [52] three 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men cannot put Humpty Dumpty together again. 23. How many miles to Babylon? 3 score miles and 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. WS0=38 52 DS1|WS1=WS0 5 | . 23 28 36 48 Doc43 and doc44 have none of the 6 words in commong so these two will come out outliers on the next recursion! OUTLIERS: 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. recurse on C3:
HOB Alternate WS0, DS0 FAUST Cluster 1.2.2 eat girl day men 33 4 15 5 49 18 36 8 32 men 11 fiddle old 41 1 run cut 17 men 14 28 three 23 three 48 three old morn 12 37 47 25 woman way 16 OUTLIERS: 2 3 6 10 13 16 22 26 30 35 38 39 42 43 44 46 Categorize clusters (hub-spoke, cyclic, chain, disjoint...)? Separate disjoint sub-clusters? Each of the 3 C423 words gives a disjoint cluster!Each of the 2 C32 work gives a disjoint sub-clusters also. C4231 day 15. Great A. little a. This is pancake day. Toss ball high. Throw ball low. Those come after sing heigh ho! 18. I had 2 pigeons bright and gay. They flew from me other day. What was reason they go? I can not tell, I do not know. C4232 eat 4. Little Miss Muffet sat on tuffet, eat curd, whey. Came big spider, sat down beside her, frightened away 8. Jack Sprat could eat no fat. Wife could eat no lean. Between them both they licked platter clean. C4233 girl 33. Buttons, farthing pair! Come who will buy them? They are round, sound, pretty, fit for girls of city. Come, who will buy ? Buttons, farthing a pair 49. There was little girl had little curl right in the middle of her forehead. When she was good she was very good and when she was bad she was horrid. C1: mother 7. Old Mother Hubbard went to cupboard to give her poor dog a bone. When she got there cupboard was bare, so poor dog had none. She went to baker to buy some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. C2: fiddle old men {cyclic} 11. 1 misty moisty morning when cloudy was weather, Chanced to meet old man clothed all leather. He began to compliment,I began to grin. How do you do How do? How do again 32. Jack come give me your fiddle, if ever you mean to thrive. No I'll not give fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many joyous day fiddle and I've had 41. Old King Cole was merry old soul. Merry old soul was he. He called for his pipe, he called for his bowl, he called for his fiddlers 3. And every fiddler, had a fine fiddle, a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. C11 cut men run {cyclic} 1. Three blind mice! See how run! All ran after farmer's wife, cut off tails with carving knife. Ever see such thing in life as 3 blind mice? 14. If all seas were 1 sea, what a great sea that would be! And if all trees were 1 tree, what a great tree that would be! And if all axes were 1 axe, what a great axe that would be! if all men were 1 man what a great man he would be! And if great man took great axe and cut down great tree and let it fall into great sea, what a splish splash that would be! 17. Here sits Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! C321 men 5. Humpty Dumpty sat on wall. Humpty Dumpty had great fall. All Kings horses, all Kings men can't put Humpty together again. 36. Little Tommy Tittlemouse lived in little house. He caught fishes in other mens ditches. C322 three 23. How many miles to Babylon? 3 score 10. Can I get there by candle light? Yes, back again. If your heels are nimble, light, you may get there by candle light. 28. Baa baa black sheep, have any wool? Yes sir yes sir, 3 bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 48. One two, buckle my shoe. Three four, knock at the door. Five six, pick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. C4.1 morn way 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on cold and frosty morn. This is way wash our hands, wash our hands, wash our hands. This is way wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash r clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. C421 plum 21. Lion &Unicorn were fighting for crown. Lion beat Unicorn all around town. Some gave them white bread and some gave them brown. Some gave them plum cake sent them out of town. 50. Little Jack Horner sat in corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! C422 old woman 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 25. There was old woman. What do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet.
FAUST Cluster 1.2.3 run 1 30 three 1 cut two old plum cut 23 three 10 fall brown 5 14 21 33 33 21 48 48 men three fall crown buy 48 three girl girl 5 14 king buy plum town 28 old three bread old 32 bake men buy back back 7 13 42 6 23 23 49 49 men 11 fiddle tree maid bad 7 bake sing men men old 41 36 day bake bread house town run 1 36 35 15 43 run dog cloth run 37 17 cloth 11 11 hill cut 37 morn way 14 dog men day old nose morn way day 32 22 eat dish 32 three day high fiddle round old son plum thumb king fiddle 18 44 47 bright wife 47 2 bed cock way old 41 41 pig 3 8 8 41 son round men fiddle pie eat merry mother 17 three clean mother 17 bed run cock always child away 4 26 29 4 old woman 12 away away 30 39 25 away 46 mother 29 old woman 12 25 eat cry mother green baby pie eat boy 9 50 50 boy 28 28 16 bag 9 lamb cry money mother baby 27 mother lamb 38 cry 27 baby full mother mother 45 lady 45 eat men word-labeled document graph We have captured only a few of the salient sub-graphs. Can we capture more of them? Of course we can capture a sub-graph for each word, but that might be 100,000. Let's stare at what we got and try to see what we might wish we had gotten in addition. A bake-bread sub-corpus would have been strong. (docs{7 21 35 42) A bake-bread sub-corpus would have been strong. (docs{7 21 35 42) There are many others. Using AVG+1 2 9 10 25 45 47 d21 0 0 1 0 0 1 d35 0 0 1 1 1 0 d39 1 1 0 0 1 0 d46 1 0 0 1 0 0 d50 0 1 0 1 1 1
HOB2 Alt (use other HOBs) FAUST Cluster 1.2.4 run 1 30 three cut two old 10 fall brown 5 14 21 33 21 48 men three fall crown buy girl king buy plum town old bread bread old bake buy back back 7 13 42 6 23 49 men tree maid bad bake sing men 36 day bake bread house town run 35 35 15 43 dog cloth 37 cloth 11 hill 35 morn way dog day old nose 32 22 eat dish three day high fiddle round old son plum plum thumb king old 18 44 47 bright wife 2 bed cock way 41 pig 3 8 son round men fiddle pie pie eat merry mother 17 pie three clean 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. bed run eat cock always child away 26 29 4 old woman 39 12 away away 30 39 25 away away 46 46 eat child 26 mother old woman 12 39 eat eat eat 46 cry pie boy pie boy green eat baby boy pie baby c o w h l o i d m l a d n 15 44 59 d12 1 1 1 9 50 50 boy 28 16 bag 9 50 lamb cry money mother baby 27 baby lamb 38 cry 27 baby full mother 45 39 lady boy boy 50 boy 28 wAvg+1, dAvg+1 a b b e p p w o r a i l a y e t e u y a m d 2 9 10 25 45 47 d21 0 0 1 0 0 1 d35 0 0 1 1 1 0 d39 1 1 0 0 1 0 d46 1 0 0 1 0 0 d50 0 1 0 1 1 1 recurse: wAv+2,dAvg-1 e p a i t e 2 9 10 25 45 d35 0 0 1 1 1 d39 1 1 0 0 1 d46 1 0 0 1 0 d50 0 1 0 1 1 And if we want to pull out a particular word cluster, just turn the word-pTree into a list.: w=baby a b w a b a y 2 3 d9 0 1 d26 1 1 d27 0 1 d45 1 w=boy a b w o a y 2 9 d28 0 1 d39 1 1 d50 0 1 For a particular doc cluster, just turn the doc-pTree into a list:
Glossary of graph theory (Wikipedia) A graphG consists of vertices and edges. Every edge has two endpoints in the set of vertices, and is said to connect or join the two endpoints. An edge can thus be defined as a set of two vertices (or an ordered pair, in the case of a directed graph - see Section Direction). The two endpoints of an edge are also said to be adjacent to each other. The pseudograph blue edges are loops and red edges are multiple edges of multiplicity 2 and 3. The multiplicity of the graph is 3. Oher models of graphs; e.g., graph may be thought of as Booleanbinary function over set of vertices or as a square (0,1)-matrix. A labeled simple graph with vertex set V = {1, 2, 3, 4, 5, 6} and edge set E = {{1,2}, {1,5}, {2,3}, {2,5}, {3,4}, {4,5}, {4,6}}. A hyperedge is an edge allowed to take on any # of vertices. A graph that allows any hyperedge is called a hypergraph. A simple graph can be considered a special case of the hypergraph, namely the 2-uniform hypergraph. W/o qualification, an edge is assumed to consist of at most 2 vertices, and a graph is never confused with a hypergraph. A non-edge (or anti-edge) is an edge that is not present in the graph. The complement of a graph G is a graph with same vertex set but with an edge set s.t. xy is an edge in iff xy is not an edge in G. An edgeless (empty, null) graph is a graph with 0vertices, but no edges. The empty graph may also be the graph with no vertices and no edges. If it is a graph with no edges and any number of vertices, it may be called the null graph on vertices. (There is no consistency at all in the literature.) A graph is infinite if it has infinitely many vertices or edges or both; otherwise finite. An infinite graph where every vertex has finite degree is called locally finite. When stated without any qualification, a graph is usually assumed to be finite. See continuous graph. G and Hisomorphic (G ~ H), if there is a 1-1 correspondence (isomorphism) between the vertices s.t. 2 vertices are adjacent in G iff the corresponding vertices are adjacent in H A graph G is homomorphic to H if there is a mapping, (homomorphism) from V(G) to V(H) s.t. if 2 vertices are adjacent in G then their corresponding vertices are adjacent in H. A subgraph of G is a graph whose vertex set is a subset of that of G, and whose adjacency relation is a subset of that of G restricted to this subset. In the other direction. A supergraph of a graph G is a graph of which G is a subgraph. A graph Gcontains another graph H if some subgraph of G is H or is isomorphic to H. A subgraph H is a spanning subgraph, or factor, of G if it has the same vertex set as G. We say H spans G. A subgraph H of G is induced (or full) if, for any pair of vertices x and y of H, xy is an edge of H if and only if xy is an edge of G. I.e., H is an induced subgraph of G if it has edges in G over same vertex set. If the vertex set is a subset S of V(G), H written G[S] (induced by S A graph G is minimal with some property P provided that G has property P and no proper subgraph of G has property P. The term subgraph is usually understood to mean "induced subgraph." The notion of maximality is defined dually: The vertex set of G is denoted by V(G), or V when there is no danger of confusion. The order of a graph is the number of its vertices, i.e. |V(G)|. An edge (a set of two elements) is drawn as a line connecting two vertices, called endpoints or end vertices or endvertices. An edge with endvertices x and y is denoted by xy. The edge set of G, is E(G), or E. An edge xy is incident to a vertex when this vertex is one of the endpoints x or y. The size of a graph is the number of its edges, i.e. |E(G)|.[1] A loop is an edge whose endpoints are the same vertex. A link has two distinct endvertices. An edge is multiple if there is another edge with the same endvertices; otherwise it is simple. The multiplicity of an edge is the number of multiple edges sharing the same end vertices; the multiplicity of a graph, the maximum multiplicity of its edges. A simple graph has no multiple edges or loops, a multigraph has multiple edges, but no loops, and a multigraph or pseudograph has both multiple edges and loops. When stated without any qualification, a graph is usually assumed to be simple, except in the literature of category theory, where it refers to a quiver. Graphs whose edges or vertices have names or labels are known as labeled, else unlabeled. The difference between a labeled and an unlabeled graph is that the latter has no specific set of vertices or edges; it is regarded as another way to look upon an isomorphism type of graphs. (distinguishes graphs with identifiable vertex or edge sets, and isomorphism types/classes of graphs on the other.) (Graph labeling refers to assignment of labels (usually natural numbers) to edges/vertices, subject to certain rules depending on situation. This should not be confused with a graph's merely having distinct labels or names on vertices.)
Glossary of graph theory 2 (Wikipedia) G is maximal wrt property P if P(G)=true and G has no proper supergraph H such that P(H). A graph that does not contain H as an induced subgraph is said to be H-free, e.g., a triangle-free graphs do not have trangle subgraph. A universal graph in a class K of graphs is simple graph s.t. every element in K can be embedded as subgraph A walk is a sequence of vertices and edges, s.t. edge's endpoints are the preceding and following vertices in the sequence. A walk is closed if its first and last vertices are the same, and open if they are different. The lengthl of a walk is the number of edges it uses. For an open walk, l=n–1, where n is # of vertices visited (a vertex is counted each time visited). A trail is a walk with all edges distinct. A closed trail is a tour or circuit. A path referred to what is now an open walk. Nowadays, when stated without any qualification, a path is usually understood to be simple, meaning that no vertices (and thus no edges) are repeated. (The term chain has also been used to refer to a walk in which all vertices and edges are distinct.). The closed equivalent to this type of walk, a walk that starts and ends at the same vertex but otherwise has no repeated vertices or edges, is called a cycle. Like path, this term traditionally referred to any closed walk, but now is usually understood to be simple by definition. Paths and cycles of n vertices are often denoted by Pn and Cn, respectively. (Some authors use length instead of number of vertices). Cycle with odd length is an odd cycle;else an even cycle. One theorem is that a graph is bipartite iff it contains no odd cycles. ( complete bipartite graph.) A graph is acyclic if it contains no cycles; unicyclic if it contains exactly one cycle; and pancyclic if it contains cycles of every possible length (from 3 to the order of the graph). A wheel graph is a graph with n vertices (n ≥ 4), formed by connecting a single vertex to all vertices of Cn-1. The girth of a graph = length of a shortest (simple) cycle; and circumference, length of a longest (simple) cycle. Girth and circumference of an acyclic graph are infinity ∞. A path or cycle is Hamiltonian (or spanning) if it uses all vertices exactly once. A graph that contains a Hamiltonian path is traceable; and one that contains a Hamiltonian path for any given pair of (distinct) end vertices is a Hamiltonian connected graph. A graph that contains a Hamiltonian cycle is a Hamiltonian graph. A trail or circuit (or cycle) is Eulerian if it uses all edges precisely once. A graph that contains an Eulerian trail is traversable. A graph that contains an Eulerian circuit is an Eulerian graph. Two paths are internally disjoint (some call it independent) if they do not have any vertex in common, except the first and last ones. A theta graph is the union of three internally disjoint (simple) paths that have the same two distinct end vertices.[3]. A tree is a connected acyclic simple graph. For directed graphs, each vertex has at most one incoming edge. A vertex of degree 1 is called a leaf, or pendant vertex. An edge incident to a leaf is a leaf edge, or pendant edge. (Some people define a leaf edge as a leaf and then define a leaf vertex on top of it. These two sets of definitions are often used interchangeably.) A non-leaf vertex is an internal vertex. Sometimes, one vertex of the tree is distinguished, and called the root ( tree is called rooted). Rooted trees are often treated as directed acyclic graphs with edges pointing away from root. A subtree of the tree T is a connected subgraph of T. A forest is an acyclic simple graph. For directed graphs, each vertex has at most one incoming edge. (That is, a tree with the connectivity requirement removed; a graph containing multiple disconnected trees.) A subforest of the forest F is a subgraph of F. spanning tree is a spanning subgraph that is a tree. Every graph has a spanning forest. But only a connected graph has a spanning tree A special kind of tree called a star is K1,k. An induced star with 3 edges is a claw. A caterpillar is a tree in which all non-leaf nodes form a single path. A k-ary tree is a rooted tree d.t. every internal vertex has kchildren. A 1-ary tree is a path. A 2-ary tree is also called a binary tree.
Glossary of graph theory 3 Cliques K5, a complete graph. If a subgraph looks like this, the vertices in that subgraph form a clique of size 5. The complete graphKn of order n is a simple graph with n vertices, every vertex is adjacent to every other. A clique in a graph is a set of pairwise adjacent vertices (any subgraph induced by a clique is complete) - terms are interchangeable. A maximal clique is not a subset of any other clique. The clique number ω(G) of a graph G is the order of a largest clique in G. Connectivity extends the concept of adjacency and is essentially a form (and measure) of concatenated adjacency. A graph is connected if a path between any 2 vertexes; otherwise, graph is disconnected. A graph is totally disconnected if there is no path connecting any pair of vertices. A cut vertex, or articulation point, is a vertex whose removal disconnects the remaining subgraph. A cut set, or vertex cut or separating set, is a set of vertices whose removal disconnects the remaining subgraph. A bridge is an analogous edge. If path between any 2 vertexes even after removing any k - 1 vertices, graph is k-vertex-connected or k-connected (iff it has k internally disjoint paths between any 2 vertices) The vertex connectivity or connectivity κ(G) of a graph G is the minimum number of vertices that need to be removed to disconnect G. In network theory, a giant component is a connected subgraph contains a majority of the entire graph's nodes. A bridge, or cut edge or isthmus, is an edge whose removal disconnects a graph. (For example, all the edges in a tree are bridges.) A cut vertex is an analogous vertex. A disconnecting set is a set of edges whose removal increases the number of components. An edge cut is the set of all edges which have one vertex in some proper vertex subset S and the other vertex in V(G)\S. A bond is a minimal (but not necessarily minimum), nonempty set of edges whose removal disconnects a graph. A graph is k-edge-connected if any subgraph formed by removing any k - 1 edges is connected. edge connectivity κ'(G) is min # of edges to disconnect G. κ(G) ≤ κ'(G) ≤ δ(G). A component is a maximally connected subgraph. A block is either a maximally 2-connected subgraph, a bridge (together with its vertices), or an isolated vertex. A biconnected component is a 2-connected component. An articulation point (also known as a separating vertex) of a graph is a vertex whose removal from the graph increases its number of connected components. A biconnected component is a subgraph induced by a maximal set of nodes that has no separating vertex. A weaker concept is a strongly connected component of a directed graph = a subgraph where all nodes are reachable by all other nodes (reachability=path between the nodes). A directed graph can be decomposed into strongly connected components by running the depth-first search (DFS) algorithm twice: first, on the graph itself and next on the transpose graph in decreasing order of the finishing times of the first DFS. Given a directed graph G, the transpose GT is the graph G with all the edge directions reversed. A hypercube graph is a regular graph with 2n vertices, 2n−1n edges, and n edges touching each vertex. obtained as the one-dimensional skeleton of the geometric hypercube. A knot in a directed graph is a collection of vertices and edges with the property that every vertex in the knot has outgoing edges, and all outgoing edges from vertices in the knot terminate at other vertices in the knot. Thus it is impossible to leave the knot while following the directions of the edges. A minor G2=(V2,E2) of G1={V1,E1) is an injection from V2 to V1 such that every edge in E2 corresponds to a path (disjoint from all other such paths) in E1, s.t. every vertex in V1 is in one or more paths, or is part of the injection from V2 to V1. An embedding of G2=(V2,E2) to G1=(V1,E1) is an injection from V2 to V1 s.t. every E2 edge corresponds to a path in G1. In graph theory, degree, especially that of a vertex, is usually a measure of immediate adjacency. An edge connects two vertices; said to be incident to that edge, or, that edge incident to those two vertices. All degree-related concepts have to do with adjacency or incidence. The degree, or valency, dG(v) of a vertex v is the number of edges incident to v (oops counted twice). A vertex of degree 0 is an isolated vertex. A vertex of degree 1 is a leaf. Total degree = sum of degrees of all its vertices. For a graph without loops, it is equal to the number of incidences between graphs and edges. The handshaking lemma states that the total degree is always equal to two times the number of edges, loops included. A degree sequence is a list of degrees of a graph in non-incr order (e.g. d1 ≥ d2 ≥ … ≥ dn). A sequence of non-increasing integers is realizable if it is a degree seq of some graph. Two vertices u and v are called adjacent if an edge exists between them. We denote this by u ~ v or u ↓ v. The set of neighbors of v, that is, vertices adjacent to v not including v itself, forms an induced subgraph called the (open) neighborhood of v and denoted NG(v). When v is also included, it is called a closed neighborhood and denoted by NG[v]. When stated without any qualification, a neighborhood is assumed to be open. The subscript G is usually dropped when there is no danger of confusion; the same neighborhood notation may also be used to refer to sets of adjacent vertices rather than the corresponding induced subgraphs. For a simple graph, the number of neighbors that a vertex has coincides with its degree. A dominating set of a graph is a vertex subset whose closed neighborhood includes all vertices of the graph. A vertex vdominates another vertex u if there is an edge from v to u. A vertex subset Vdominates another vertex subset U if every vertex in U is adjacent to some vertex in V. The minimum size of a dominating set is the domination number γ(G). In computers, a finite, directed or undirected graph (with n vertices) is often represented by its adjacency matrix: an n-by-nmatrix whose entry in row i and column j gives the number of edges from the i-th to the j-th vertex. Spectral graph theory studies relationships between the properties of a graph and its adjacency matrix or other matrices of the graph. The maximum degree Δ(G) of a graph G is the largest degree over all vertices; the minimum degree δ(G), the smallest. A graph s.t. every vertex has same degree is regular. It is k-regular if every vertex has degree k. A 0-regular graph is an independent set. A 1-regular graph is a matching. A 2-regular graph is a vertex disjoint union of cycles. A 3-regular graph is cubic, or trivalent.
Glossary of graph theory 4 A k-factor is a k-regular spanning subgraph. A 1-factor is a perfect matching. A partition of edges into k-factors is a k-factorization. A graph is biregular if it has unequal maximum and minimum degrees and every vertex has one of those two degrees. A strongly regular graph is a regular graph s.t. any adjacent vertices have same # of common neighbors as other adjacent pairs and that any nonadjacent vertices have the same number of common neighbors as other nonadjacent pairs. Independentmeans pairwise disjoint or mutually nonadjacent. Independence is a form of immediate nonadjacency. An isolated vertex is a vertex not incident to any edges. An independent set, or coclique, or stable set or staset, is a set of vertices of which no pair is adjacent. Since the graph induced by any independent set is an empty graph, the two terms are usually used interchangeably. Two subgraphs are edge disjoint if they have no edges in common (vertex disjoint if they have no vertices (and thus, also no edges) in common. Unless specified, a set of disjoint subgraphs are assumed to be pairwise vertex disjoint. The independence number α(G) of a graph G is the size of the largest independent set of G. A graph can be decomposed into independent sets (vertex set can be partitioned into pairwise disjoint independent subsets). Such are called partite sets, or simply parts. A graph that can be decomposed into 2 partite sets bipartite; 3 sets, tripartite; k sets, k-partite; and an unknown number of sets, multipartite. An 1-partite graph is the same as an independent set (empty graph). A 2-partite graph is the same as a bipartite graph. If G can be decomposed into k partite sets it's k-colourable. A complete multipartite graph is a graph in which vertices are adjacent if and only if they belong to different partite sets. A complete bipartite graph is also referred to as a biclique; if its partite sets contain n and m vertices, respectively, then the graph is denoted Kn,m. A k-partite graph is semiregular if each of its partite sets has a uniform degree; equipartite if each has the same size; and balanced k-partite if each differs in size by at most 1. The matching number of a graph G is the size of a largest matching, or pairwise vertex disjoint edges, of G. A spanning matching (AKA perfect matching) covers all vertices. Complexity of a graph denotes the quantity of info a graph contained, measured e.g.,, by counting the number of its spanning trees, or the value of a certain formula involving the number of vertices, edges, and proper paths in a graph. [5] The distancedG(u, v) between two (not necessary distinct) vertices u and v in a graph G is the length of a shortest path between them. When u and v are identical, their distance is 0. When u and v are unreachable from each other, their distance is defined to be infinity ∞. The eccentricity εG(v) of a vertex v in a graph G is the maximum distance from v to any other vertex. The diameter diam(G) of a graph G is the maximum eccentricity over all vertices in a graph; and the radius rad(G), the minimum. When there are two components in G, diam(G) and rad(G) defined to be infinity ∞. Trivially, diam(G) ≤ 2 rad(G). Vertices with maximum eccentricity are called peripheral vertices. Vertices of minimum eccentricity form the center. A tree has at most two center vertices. The Wiener index of a vertexv (WG(v)) is sum of distances between v and all others. Wiener index of a graphG (W(G)) = sum of distances over all pairs of vertices. An undirected graph's Wiener polynomial is defined Σ qd(u,v) over all unordered pairs of vertices u and v. Wiener index/polynomial are of particular interest to math chemists. k-th powerGk of a graph G is a supergraph formed by adding an edge between all pairs of vertices of G with distance at most k. A 2nd power of a graph is also called a square. A k-spanner is a spanning subgraph, S s.t. every 2 vertices are at most k times as far apart on S than on G. k is dilation. k-spanner is used for studying geometric network opt. A crossing is a pair of intersecting edges. A graph is embeddable on a surface if its vertices and edges can be arranged on it without any crossing. The genus of a graph is the lowest genus of any surface on which the graph can embed. A planar graph is one which can be drawn on the (Euclidean) plane without any crossing; and a plane graph, one which is drawn in such fashion. In other words, a planar graph is a graph of genus 0. The example graph is planar; the complete graph on n vertices, for n> 4, is not planar. Also, a tree is necessarily a planar graph. When a graph is drawn w/o crossing, any cycle that surrounds a region w/o any edges reaching from cycle into region it forms a face. 2 faces on a plane graph are adjacent if they share a common edge. A dual, G* of a plane graph G is a graph whose vertices represent the faces, including any outerface, of G and are adjacent in G* iff their corresp faces are adjacent in G. The dual of a planar graph is always a planar pseudograph (e.g. consider the dual of a triangle). In the familiar case of a 3-connected simple planar graph G (isomorphic to a convex polyhedronP), the dual G* is also a 3-connected simple planar graph (and isomorphic to the dual polyhedron P*). Since there is a sense of "inside" and "outside" on a plane, we can identify an "outermost" region that contains the entire graph if the graph does not cover the entire plane. Such an outermost region is an outer face. An outerplanar graphcan be drawn in the plane s.t. its vertices are adjacent to the outer face. An outerplane graph id drawn thus. The minimum number of crossings that must appear when a graph is drawn on a plane is called the crossing number. The minimum number of planar graphs needed to cover a graph is the thickness of the graph. A weighted graph associates a label (weight) with every edge in the graph. Weights are usually real numbers. They may be restricted to rational numbers or integers. Certain algorithms require further restrictions on weights; for instance, Dijkstra's algorithm works properly only for positive weights. The weight of a path in a weighted graph is the sum of the weights of the selected edges. Sometimes a non-edge (a vertex pair with no connecting edge) is indicated by labeling it with a special weight representing infinity. Sometimes cost is used instead of weight. When stated w/o qualification, a graph is always assumed to be unweighted. In some writing on graph theory the term network is a synonym for a weighted graph. A network may be directed or undirected, it may contain special vertices (nodes), such as source or sink. The classical network problems include: minimum cost spanning tree, shortest paths, maximal flow (and the max-flow min-cut theorem)
Glossary of graph theory 5 A directed arc, or directed edge, is an ordered pair of endvertices represented graphically as an arrow drawn between the endvertices, where the first vertex is called the initial vertex or tail; 2nd the terminal vertex or head (appears at arrow head). An undirected edge disregards any sense of direction and treats both endvertices interchangeably. A loop in a digraph, however, keeps a sense of direction and treats both head and tail identically. A set of arcs are multiple, or parallel, if they share the same head and the same tail. A pair of arcs are anti-parallel if one's head/tail is the other's tail/head. A digraph, or directed graph, or oriented graph, is analogous to an undirected graph except that it contains only arcs. A mixed graph may have directed/undirected edges; it generalizes both directed and undirected graphs. Without qualification, a graph is almost always assumed to be undirected. A digraph is called simple if it has no loops and at most one arc between any pair of vertices. A digraph is usually assumed simple. A quiver is a directed graph, specifically allowed, but not required, to have loops and more than one arc between any pair of vertices. In a digraph Γ, we distinguish the out degreedΓ+(v), the number of edges leaving a vertex v, and the in degreedΓ-(v), the number of edges entering a vertex v. If the graph is oriented, the degree dΓ(v) of a vertex v is equal to the sum of its out- and in- degrees. out-neighborhood, or successor set, N+Γ(v) of a vertex v = set of heads of arcs going from v. in-neighborhood, or predecessor set, N-Γ(v) =set of tails of arcs going into v A source is a vertex with 0 in-degree; and a sink, 0 out-degree. A vertex vdominates another vertex u if there is an arc from v to u. A vertex subset S is out-dominating if every vertex not in S is dominated by some vertex in S; and in-dominating if every vertex in S is dominated by some vertex not in S. A kernel in a (directed?) graph G is an independent set S s.t. every vertex in V(G) \ S dominates some vertex in S. In undirected graphs, kernels are maximal independent sets A digraph is kernel perfect if every induced sub-digraph has a kernel.[7] An Eulerian digraph is a digraph with equal in- and out-degrees at every vertex. The zweieck of an undirected edge is the pair of diedges and which form the simple dicircuit. An orientation is an assignment of directions to the edges of an undirected or partially directed graph. When stated without any qualification, it is usually assumed that all undirected edges are replaced by a directed one in an orientation. Also, the underlying graph is usually assumed to be undirected and simple. A tournament is a digraph s.t. each pair of vertices is connected by 1 arc. In other words, it is an oriented complete graph. A directed path is an oriented simple path such that all arcs go the same direction, meaning all internal vertices have in- and out-degrees 1. A vertex v is reachable from another vertex u if there is a directed path that starts from u and ends at v. Note that in general the condition that u is reachable from v does not imply that v is also reachable from u. If v is reachable from u, then u is a predecessor of v and v is a successor of u. If there is an arc from u to v, then u is a direct predecessor of v, and v is a direct successor of u. A digraph is strongly connected if every vertex is reachable from every other following the directions of the arcs. A digraph is weakly connected if its underlying undirected graph is connected. A weakly connected graph can be thought of as a digraph in which every vertex is "reachable" from every other but not necessarily following the directions of the arcs. A strong orientation is an orientation that produces a strongly connected digraph. A directed cycle, or just a cycle when the context is clear, is an oriented simple cycle such that all arcs go the same direction, meaning all vertices have in- and out-degrees 1. A digraph is acyclic if it does not contain any directed cycle. A finite, acyclic digraph with no isolated vertices necessarily contains at least one source and at least one sink. An arborescence (out-tree, branching, oriented tree) is a tree s.t. all vertices are reachable from a single vertex. An in-tree is an oriented tree s.t. a single vertex is reachable from every other. Directed acyclic graphs (DAGs) The partial order structure of directed acyclic graphs (or DAGs) gives them their own terminology. If there is a directed edge from u to v, then u is a parent of v and v is a child of u. If there is a directed path from u to v, u is an ancestor of v and v is descendant of u. The moral graph of a DAG is the undirected graph created by adding an (undirected) edge between all parents of same node (sometimes called marrying), and then replacing all directed edges by undirected edges. A DAG is perfect if, for each node, the set of parents is complete (i.e. no new edges need to be added when forming the moral graph). Colouring This graph is an example of a 4-critical graph. Its chromatic number is 4 but all of its proper subgraphs have a chromatic number less than 4. This graph is also planar Vertices in graphs can be given colours to identify or label them. Although they may actually be rendered in diagrams in different colours, working mathematicians generally pencil in numbers or letters (usually numbers) to represent the colours. A k-colouring of G(V,E) is ϕ : V → {1, ..., k} with every vertex assigned a colour with the condition that adjacent vertices cannot be assigned same color. The chromatic numberχ(G) is the smallest k for which G has a k-colouring. Given a graph and a colouring, the colour classes of the graph are the sets of vertices given the same colour. A graph is k-critical if its chromatic number is k but all its proper subgraphs have chromatic number < k. An odd cycle is 3-critical, and complete graph on k vertices is k-critical. A graph invariant is a property of a graphG, usually a number or a polynomial, that depends only on the isomorphism class of G. Examples, order, genus, chromatic number, and chromatic polynomial of a graph. Graph (mathematics)List of graph theory topics
FAUST HULL Classification 1 Using the clustering of FAUST Clustering1 as classes, we extract 80% from each class as TrainingSet (w class=cluster#). How accurate is FAUST Hull Classification on the remaining 20% plus the outliers (which should be "Other"). C11={2,3,16,22,42,43} C2 ={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} C11={3} C11={2,16,22,42,43} C311= {11,17,29} C312={13,30,50} C313={10,26,28,41} C2 ={4,14,23,45} C2 ={1,5,8,9,12,15,25,27,32,33,36,37,38,44,47,48} Full classes from slide: FAUST Clustering1 20% Test Set C311= {11,17} C312={30,50} C313={10,28,41} C311= {29} C312={13} C313={26} 80% Training Set OUTLIERS {18,49} {6} {39} {21} {46} {7} {35} O={18 49 6 39 21 46 7 35} .305 .439 C312 D11=C11 p=avC11 L MIN MAX CLASS .63 .63 C11 0 .63 C2 0 0 C311 .31 .31 C312 0 .31 C313 0 C311 .31 C312 0 .22 C11 .44 .66 C313 D2=C2 p=avC2 L MIN MAX CLS 0 .22 C11 .44 .77 C2 .66 .66 C311 .11 .22 C312 .44 .66 C313 -.09 .106 C11 D1=TS p=avTS Lpd MIN MAX CLASS -0.09 .106 C11 0.106 .439 C2 0.572 .572 C311 0.305 .439 C312 0.505 .771 C313 .572 C311 .11 .22 C312 .44 .77 C2 0 .31 C313 .63 C11 .106 .439 C2 .505 .771 C313 .66 C311 0 .63 C2 0 C11 .31 C11 .31 C2 .31 C311 .31 C313 D312=C312 p=avC312 L MN MX CLAS 0 .31 C11 0 .31 C2 0 .31 C311 1.58 1.58 C312 0 .31 C313 0 .22 C11 0 .44 C2 0 .44 C311 D311=C311 p=avC311 L MN MX CLAS 0 0 C11 0 0.66 C2 1.33 1.66 C311 0 0.33 C312 0 0.33 C313 D313=C313 p=avC313 L MN MX CLAS 0 .22 C11 0 .44 C2 0 .44 C311 0.22 .22 C312 1.34 1.56 C313 1.3 1.6 C311 0 .33 C312 0 .33 C313 1.34 1.56 C313 .22 C312 1.58 C312 0 .66 C2 D1=TS p=avTS Sp 4.2 C313 5.4 1.9 C11 2.1 2.4 C311 3.4 4.6 C312 4.7 1.8 C2 3.8 Use Lpd, Sp, Rpd with p=ClassAvg and d=unitized ClassSum. All 6 class hulls separated using Lpd, p=CLavg, D=CLsum. D311 separates C311, D312 separates C312 and D313 separates C313 from all others. D2 separates C11 and C2. Now, remove some false positives with S and R using the same p's and d's: D11=C11 p=avC11 Sp [1.6]C11 [3.4 4 4]C311 [5.4 6]C313 [2.4 4.4]C2 [5]C312 D2=C2 p=avC2 Sp [2 2.3]C11 [4.5 5.8]C313 [1.8 3.5]C2 [5 5.1]C312 [2.5 3.5]C311 D313=C313 p=avC313 Sp [3.5 4.2]C11 [6.5]C312 [2.8 6.2]C2 [3.8 6.2]C311 [2.5 3.5]C313 D311=C311 p=avC311 Sp [1.2]C311 [4.2]C11 [6.2 7.2]C312 [2.2 6.2]C2 [6.2 8.2]C313 D312=C312 p=avC312 Sp [3.5 4.5]C11 [6.5 7.5]C313 [4.5 6.5]C2 [2.5]C312 [5.5]C311 Sp removes a lot of the potential for false positives. (Many of the classes lie a single distance from p.) D11=C11 p=avC11 Rpd [1.2]C11 [1.4 2]C2 [1.7 2]C311 [2.2 2.]]C312 [2.2 2.4]C313 D1=TS p=avTS Rpd [1.3 1.4]C11 [1.3 1.9]C2 [1.5 1.8]C311 [2.1]C312 [2.0 2.2]C313 D2=C2 p=avC2 Rpd [1.3 1.4]C11 [1.3 1.8]C2 [1.6 1.8]C311 [2.2]C312 [2.1 2.4]C313 D313=C313 p=avC313 Rpd [1.3 1.4]C11 [1.3 2]C2 [1.6 2]C311 [2.2]C312 [1.5 1.8]C313 D312=C312 p=avC312 Rpd [1.3 1.4]C11 [1.4 2]C2 [1.7 1.9]C311 [1.5]C312 [2.2 2.4]C313 D311=C311 p=avC311 Rpd [1.4]C11 [1.2 2]C2 [1.1]C311 [2.2]C312 [2.2 2.4]C313 Rpd removes even more of the potential for false positives.
Test Set FAUST Hull Classification 2 (TESTING) D1=TS p=avTS Rpd [1.3 1.4]C11 [1.3 1.9]C2 [1.5 1.8]C311 [2.1]C312 [2.0 2.2]C313 D1=TS p=avTS Sp C11={3} [1.9 2.1]C11 [2.4 3.4]C311 [4.2 5.4]C313 D1=TS p=avTS Lpd [4.6 4.7]C312 [1.8 3.8]C2 C2 ={4,14,23, 45} [.57]C311 [.31 .44]C312 [-.09 .11]C11 [.11 .44]C2 [.51 .77]C313 C311= {29} C312={13} C313={26} D11=C11 p=avC11 Rpd [1.2]C11 [1.4 2]C2 [1.7 2]C311 [2.2 2.]]C312 [2.2 2.4]C313 D11=C11 p=avC11 Sp [1.6]C11 [3.4 4 4]C311 [5.4 6]C313 [2.4 4.4]C2 [5]C312 D11=C11 p=avC11 Lpd [0]C311 [.31]C312 O={18 49 6 39 21 46 7 35} [.63]C11 [0 .31]C313 [0 .63]C2 D2=C2 p=avC2 Lpd D2=C2 p=avC2 Sp [2 2.3]C11 [4.5 5.8]C313 [1.8 3.5]C2 [5 5.1]C312 [2.5 3.5]C311 D2=C2 p=avC2 Rpd [1.3 1.4]C11 [2.1 2.4]C313 [1.3 1.8]C2 [1.6 1.8]C311 [2.2]C312 .[44 .66]C313 [0 .22]C11 [.44 .77]C2 [.66] C311 [.11 .22]C312 D311=C311 p=avC311 Sp [1.2]C311 [4.2]C11 [6.2 7.2]C312 [2.2 6.2]C2 [6.2 8.2]C313 D311=C311 p=avC311 Rpd [1.4]C11 [1.2 2]C2 [1.1]C311 [2.2]C312 [2.2 2.4]C313 D311=C311 p=avC311 Lpd [0]C11 [1.3 1.6]C311 [0 .33]C312 [0 .33]C313 [0 .66]C2 D312=C312 p=avC312 Lpd D312=C312 p=avC312 Sp [3.5 4.5]C11 [6.5 7.5]C313 [4.5 6.5]C2 [2.5]C312 [5.5]C311 D312=C312 p=avC312 Rpd [1.3 1.4]C11 [2.2 2.4]C313 [1.4 2]C2 [1.7 1.9]C311 [1.5]C312 .31 C11 .31 C2 .31 C311 .31 C313 1.58 C312 D313=C313 p=avC313 Rpd [1.3 1.4]C11 [1.3 2]C2 [1.6 2]C311 [2.2]C312 [1.5 1.8]C313 D313=C313 p=avC313 Sp [3.5 4.2]C11 [6.5]C312 [2.8 6.2]C2 [3.8 6.2]C311 [2.5 3.5]C313 D313=C313 p=avC313 Lpd [0 .22]C11 [0 .44]C2 [0 .44]C311 [1.3 1.6]C313 [.22]C312 ε=.8 predicted Class 11 2 2 2 311(all 311|2 all) 312(all 312|313 a Other . . . . . . . Other D=TS Rpd Sp Lpd trueCL Predicted____CLASS Final R S L predicted 1.41 2.19 -0.4 11 d3 2 Oth 11 Other 1.40 2.06 -0.3 2 d4 2 2 11 Other 1.92 3.71 0.01 2 d14 Oth 2 11 Other 1.38 1.99 -0.2 2 d23 2|11 2|11 Oth Other 1.97 3.92 -0.1 311 d29 Oth 312|313 11 Other 2.22 4.99 -0.2 312 d13 313 313 11 Other 2.60 6.78 -0.0 313 d26 Oth Oth 11 Other 1.40 2.13 -0.3 d6 2|11 2 11 Other 2.50 6.37 0.34 d7 313 Oth 2 Other 1.40 2.06 -0.3 d18 2|11 2|11 Oth Other 2.42 5.92 -0.1 d21 313 Oth Oth Other 3.46 12.2 0.47 d35 Oth Oth Oth Other 2.60 6.78 -0.0 d39 Oth Oth 11 Other 2.35 5.57 0.14 d46 Oth Oth 2 Other 1.41 2.19 -0.4 d49 2 2 Oth Other 8/15 = 53% correct just with D=TS p=AvgTS Note: It's likely to get worse as we consider more D's. Let's think about TrainingSet quality resulting from clustering. This a poor quality TrainingSet (from clustering Mother Goose Rythmes. MGR is a difficult corpus to cluster since: 1., in MGR, almost every document is isolated (an outlier), so the clustering is vague (no 2 MGRs deal with the same topic so their word use is quite different.). Instead of tightening the class hulls by replacing CLASSmin and CLASSmax by CLASSfpci (fpci=first percipitous count increase) and CLASSlpcd, we might loosen class hulls (since we know the classes somewhat arbitrary) by expanding the [CLASSmin, CLASSmax] interval as follows: Let A = Avg{ClASSmin, CLASSmax} and R (for radius) = A-CLASSmin (=CLASSmax-A also). Use [A-R-ε, A+R+ε]. Let ε=.8 increases accuracy to 100% (assuming all Other stay Other.). Finally, it occurs to me that Clustering to produce a TrainingSet, then setting aside a TestSet gives a good way to measure the quality of the clustering. If the TestSet part classifies well under the TrainingSet part, the clustering must have been high quality (produced a good TrainingSet for classification). This clustering quality test method is probably not new (check the literature?). If it is new, we might have a paper here? (discuss this quality measure and assess using different ε's?)
APPENDIX FAUST Clustering 2 Other variations of the FAUST Clustering1 Algorithm Functional GapCluster Dendogram D=sum of all docs in subcluster, but use all gaps! Av: .05 .11 .09 .07 .05 .05 .07 .07 .07 .07 .05 .05 .09 .05 .05 .05 .07 .07 .05 .09 .05 .09 .05 .07 .11 .07 .07 .05 .05 .05 .05 .05 .05 .07 .05 .05 .05 .14 .05 .05 .05 .11 .05 .14 .07 .05 .07 .07 .09 .05 .07 .11 .05 .07 .05 .05 .07 .05 .05 .05 always away baby back bad bag bake bed boy bread bright brown buy cake child clean cloth cock crown cry cut day dish dog eat fall fiddle full girl green high 0.51 d5 0.73 d30 0.77 d28 0.82 d41 0.86 d39 0.99 d46 1.16 d7 1.47 d35 hill house king lady lamb maid men merry moneymorn mother nose old pie pig plum round run sing son three thumb town tree two way wife woman wool 0.69 d26 0.69 d50 0.69 d13 word# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0.56 d1 0.56 d32 0.56 d45 0.56 d14 0.56 d27 0.64 d10 0.64 d17 0.64 d21 0.64 d29 0.64 d11 0.47 d47 0.47 d9 0.47 d37 df# 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2 min=2 0.38 d33 0.38 d48 0.38 d8 0.43 d4 0.43 d12 0.25 d18 0.25 d3 0.25 d43 0.25 d6 0.21 d42 0.21 d2 0.21 d16 0.34 d23 0.34 d15 0.34 d44 0.34 d38 0.34 d25 0.34 d36 0.17 d22 0.17 d49 1.6 d26 word# 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 df# 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 max=6 0.77 d9 0.8 d32 1.2 d14 0.89 d4 1.34 d12 0 d21 .2 d10 .6 d17 1.54 d47 1.54 d37 0.94 d18 0.94 d6 0.63 d3 0.63 d43 0.94 d15 1.17 d44 0.70 d38 1 d1 1 d45 1 d27 0.47 d23 0.47 d25 0.47 d36 1.37 d50 1.37 d13 .4 d11 .4 d29 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I!
Functional GapClusterer FAUST Clustering3 (HOB clustering1) D=sum of all docs in subcluster but use HOB! 12 15 18 37 44 48 15 44 12 18 37 48 12 18 48 37 18 12 48 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 1 d45 1 d27 1 d1 d7 d35 3 4 6 8 9 23 25 33 36 38 43 47 2 16 22 42 49 1.. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! d7 d35 1 5 10 11 13 14 17 21 26 27 28 29 30 32 39 41 45 46 50 .4 d11 .4 d29 1.37 d50 1.37 d13 3 4 8 9 33 38 47 6 23 25 36 43 47 3 4 8 9 33 38 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 27 45 39 46 1 5 10 11 13 14 17 21 26 28 29 30 32 50 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 4 8 9 33 38 3 d48 d33 d8 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 8 4 9 33 38 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 0.63 d3 0.63 d43 0.94 d18 0.94 d6 41 0.47 d23 0.47 d25 0.47 d36 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 0.17 d22 0.17 d49 0.21 d42 0.21 d2 0.21 d16
FAUST Clustering4 DS1=all docs s.t. WS0wc(doc)>2 Converge using HOB WS0=words wc(MG)>½max (>3) DS1=all docs s.t. WS0wc(doc)>2. DS1=all docs s.t. WS0wc(doc)>2. Converge using HOB Converge using HOB Remove C1. WS0=words wc(MG')>½max (>3). Remove C2. WS0=words wc(MG')>½max (>3). C1 (mother theme) 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. WS0 2 3 13 20 22 25 38 42 44 49 52 ------------------------------------- DS1 |WS1 3 13 20 42 7 |------------------ 27 |DS2 |WS2 3 13 42 45 | 7 |------------ 46 | 9 |DS3=C1 |27 | 7 |45 | 9 |27 |45 C2 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 17. Here sits the Lord Mayor. Here sit his 2 men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. WS0 2 22 25 38 44 49 52 ----------------------------- DS1 | WS1 2 25 27 38 44 49 52 1 | ----------------------- 4 | DS2 11 | 1 17 | 4 30 | 11 32 | 17 41 | 30 46 | 32 | 41 | 46 This is not as good a cluster as C!. Lets try starting with DS0=docs dc(MG')>½max (>6.5) C2 (pie theme) 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says sparrow, I'll not make a stew. So he flapped his wings and away he flew. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! DS0 |WS1=30 45 26 |-------------------- 35 |DS1 |WS2=9 25 30 45 39 |26 |--------------- |35 |DS2|WS3=9 25 45 |39 |35 |----------- |50 |39 |DS3 | |50 |35 | | |39 | | |50 C3 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. WS0=2 22 38 44 49 52 ------------------------- DS1 |WS1=2 27 38 44 49 52 1 |-------------------- 11 | DS2 17 | 1 30 | 11 32 | 17 41 | 30 46 | 32 | 41 | 46 This is not a good cluster! Lets Again starting with DS0=docs dc(MG'')>½max (>3.5) C3 (crown and brown theme?) 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. DS0 |WS1=1 8 12 19 26 32 41 47 54 57 60 10 29 |---------------------------------- 13 37 |DS1 |WS2=12 19 14 44 |10 |----------------------------- 21 47 |21 |DS2 26 | |10 28 | |21 Remove C3. Start with DS0=docs dc(MG''')>½max (>3.5) C4 (morning theme) 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. DS0 |WS1= 1 8 41 57 60 13 | ---------------------------- 14 | DS1 |WS2= 1 8 41 57 26 | 26 | ---------------------- 28 | 29 | DS2 |WS3= 8 41 57 29 | 37 | 29 | ---------------- 37 | 47 | 37 | DS3 |WS4= 41 57 44 | | 47 | 37 |----------- 47 | | | 47 | DS4 | | | | 37 | | | | 47 Remove C4. Start with DS0=docs dc(MG''')>½max (>3.5)
FAUST Clustering4 (continued) Remove C3. Start with DS0=docs dc(MG''')>½max (>3.5) C5 (sheep theme? (But 13 is an internal class outlier!)) Let's consider an alternative C5 starting with DS0 instead of WS0! 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. WS0 1 3 4 5 6 8 11 13 15 16 20 22 25 26 29 31 36 38 44 48 51 52 54 59 60 ------------------------------------ DS1 |WS1 1 3 4 6 9 13 15 16 20 28 30 | 36 47 51 52 54 6 13 |------------------------------- 26 28 |DS2 13 26 28 C5 (sleep-lamb hub(26) and spokes(28,29) theme? 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. DS0 |WS1 1 60 13 |-------------- 14 |DS1 |WS2 1 60 26 |26 | 28 |28 | 29 |29 | 44 | | Remove C5. Start with DS0=docs dc(MG''')>½max (>2.5) C6 fall (and men) theme 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! DS0 |WS1=13 26 | 31 38 5 33|---------- 8 38|DS1|WS2=26 14 15| | 38 12 44| 5 |------ 13 48|14 |DS2 5 14 Remove C6. Start with DS0=docs dc(MG''')>½max (>2.5) C7 hub(buy,13,33) spoke(high,15,44) theme 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. DS0|WS1= 13 31 8 |-------------- 12 |DS1 |WS2=13 31 13 | 13 |--------- 15 | 15 | 33 | 33 | 38 | 44 | 44 48| Remove C7. Start with DS0=docs dc(MG''')>½max (>1.5) C8 old people theme 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. DS0|WS1 5 22 25 | 44 52 59 all|--------------- |DS1|WS2=44 59 | 6 |----------- |12 |DS2 |WS3= |25 |12 |44 59 | |25 | Remove C8. Start with DS0=docs dc(MG''')>½max (>1.5) C9 theme? 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. DS0|WS1 5 22 25 53 all|-------------- |DS1|WS2 5 22 25 | 6 |--------------- | 8 |DS2|WS3 5 22 25 |18 | 4 | |22 | 6 | |49 |18 49 Remove C9. Start with DS0=docs dc(MG''')>½max (>1.5) C10 theme? 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. DS0|WS1 all all|------- |DS1 |all
FAUST ARM1 Any relationship (e.g., a text corpus) is a labeled bipartite graph: Describing the relationships graphically? Graphical metadata or type: Entity-Relationship diagrams wc=3 doc1 doc2 . . . docN dc=df=2 word1 word2 . . . wordn tf=8 The incidence counts (as well as any other entity attribute) can be used to define sub-graphs and then we can search for stable (convergent) sub-graphs under that def. For doc-word relationship we used wc2 & dc2. Next we will try, wc2 & dc1. After that we will try wc1 & dc2. tf wc doc# dc part of speech author word# contains Word Document text relationship--> quantity ic cust# cc supplier zip item# market basket relationship--> buys item customer Graphical instances labeled bipartite graphs market basket instances recommender instances social network instances social network relationship doc1 doc2 . . . docN cus1 cus2 . . . cusN cus1 cus2 . . . cusN mem1 mem2 . . . memN word1 word2 . . . wordn item1 item2 . . . itemn item1 item2 . . . itemn mem1 mem2 . .memN rating type ic cus# fc mem# cc supplier fc descr zip descr item# mem# wc=3 author=Bob ic=3 zip=58103 ic=3 zip=58103 fc=3 zip=58103 recommender relationship--> tf=8 q=3 rating=5 type=spouse rates item befriends member customer member fc=2 desr=old dc=2 PoS=verb cc=2 supplier=acme cc=2 supplier=acme In all cases, implement as two (redundant) ,pTreeSets, 1 for each entity, with 1 iff an edge. One PTS is rotation of other. If there is a numeric edge label (e.g., tf) each SPTS its bitslices, else 1 bit map. term_frequency (tf) labels each edge with number of times the word occurs in the document. doc_frequency (df) (or doc count (dc)) labels each word with the number of docs the word occurs in. word_count (wc), labels each doc with number of words it contains. We typically lower-bound threshold each of these labels . First, transform the corpus to an existential (either word exists in doc or it doesn't) corpus, using lower bound tf 1? Second, lower bound [and/or upper bound] df (e.g., df 2 requires each word to occur in at least 2 docs)?. Third, lower bound wc (e.g., wc2 requires each doc to contain at least 2 words?.
01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 FAUST ARM2 Four versions: SELECT Doc from EdgeD where Word=W4 = D2, D3 WordPTS(W4) = 0 1 1 EdgeD E# D# W# 1 1 7 2 1 6 3 1 3 4 2 4 5 2 2 6 3 7 7 3 6 8 3 1 9 3 4 EdgeW E# D# W# 8 3 1 5 2 2 3 1 3 4 2 4 9 3 4 2 1 6 7 3 6 6 3 7 1 1 7 DocPTS WordPTS D1 0 0 1 0 0 1 1 D2 0 1 0 1 0 0 0 W1 0 0 1 D3 1 0 0 1 0 1 1 W4 0 1 1 W2 0 1 0 W3 1 0 0 W5 0 0 0 W6 1 0 1 W7 1 0 1 Suppose we have a corpus of 1.7 million documents, a vocabulary of 100,000 words and an average documents size of 20 words (think emails). VerticalorHorizontal data structuring? I.e., do we bitmap (both ways?) or just use a simple edge table in MySQL? We don't want to be accused of cutting a board with a hammer (I've actually done that ;-) just because we have a great hammer! We can grab a great saw when it's the right tool (e.g., MySQL). Horizontal: Edge(edge#,doc,word) has 1.7M*20= 34M rows (each ~40 bits) that's 1,360,000,000 bits Vertical (assuming we are capturing tf=term frequency, with a max of 7 (3 bits) DocPTreeSet: 3*1,700,000 = 5,100,000 DocPTrees, each 100,000 bits deep, so 510,000,000,000 bits WordPTreeSet: 3*100,000 = 300,000 WordPTrees, each 1,700,000 bits deep, so 510,000,000,000 bits. So it might not be a bad idea to have three versions of the corpus, Edge(Edge#,Doc,Word), DocPTreeSet, WordPTreeSet or even four versions: EdgeD(Edge#,Doc,Word), EdgeW(Edge#,Doc,Word) where EdgeD is ordered on Doc (same ordering as the doc ordering in DocPTreeSet) and EdgeW is ordered on Word (same ordering as the word ordering in WordPTreeSet). Let's assume we don't capture term frequency (just the existential data, word exists in doc), then SELECT Doc from EdgeD where Word=W4 is just the list version of WordPTreeSet(W4) etc. Our main interest (and, it appears, Treeminers) is in data mining large text corpuses such as emails, tweets, etc. Therefore we will use as our example dataset, the following 44 Mother Goose Rythmes with a vocabulary of 60 synonymized content words.
1 2 3 4 5 6 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 DS1 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 FAUST ARM3 Look for Convergent, Dense Sub-Corpuses (this is somewhat ARM like data mining) The algorithm will be callled CDSC(w=0%, d=15%, DS0=doc1) DS0 = {doc1} WS1=Voc(DS0)={words in > 0% of DS0} DS1={docs with > 15% of WS1} WS2=Voc(DS1)={words in >0% of DS1} DS2={docs with > 15% of WS2} ... DS0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 =DS2 DS0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CDSC(w=0%,d=15%,DS0=35SSS) converges to Sub-Corpus, DS2={7,35,50}, WS2={4,7,9,10,13,17,23,24,25,28,33,34,37,40,42,43,44,45,47,50,53} ED = 25 / (3*21)=25 / 63= 39.7% whereas the original MG corpus EdgeDensity was 167/44x60=167/2640= 6.3% CDSC(w=0%,d=15%,DS0=7OMH) converges to Sub-Corpus, DS3={7,13,35}, DS3ocab={4,7,10,13,17,23,24,25,28,33,34,37,40,42,43,44,45,47,50,51.54}. ED=25/3*21= 25/63= 39.7%. Notes: We may need HighDocumentCount since a singleton DocSet with its vocab has EdgeDensity = 100%. A doubleton DS with its vocab will have high EdgeDensity too (in some sense the EdgeDensity measure the Vocab overlap of the two documents!). Lower EDThreh for large DocSets, e.g., for DSsize>2, ED=doubletonED/DocSetSize*VocabSize? 15% of 13 = 1.95 Vocab(DocSet1) 15% of 21 = 3.15 Vocab(DocSet2) 15% of 22 = 3.3 Vocab(DocSet2) 15% of 21 = 3.15 Vocab(DocSet3) 15% of 7=1.05 Vocab(DocSet1) D O C u m e nt 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 50
1 2 3 4 5 6 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 21LAU 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 b r o w n 1 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c a k e 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 c r o w n 1 9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f a l l 2 6 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h i l l 3 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 FAUST ARM4 Convergent, Dense Sub-Corpuses: CDSC(w=0%, d=10%, DS0=doc1) DS0 = {doc1} WS1=Voc(DS0)={words in > 0% of DS0} DS1={docs with > 15% of WS1} WS2=Voc(DS1)={words in >0% of DS1} DS2={docs with > 15% of WS2} ... 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 10JAJ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 CDSC(w=0,d=10, DS0=35SSS) conv to DS4={7,10,13,21,35,50}, WS4={4,7,9,10,12,13,14,17,19,23,24,25,26,28,32,33,34,37,40,42,43,44,45,47,50,51,53,54 ED=41/28*6=24.4%. Lowering the DS%ofVocab from 15% to 10% decreases ED (Because it increases DSSize from 3 to 6?). 10% of 13 = 1.3 10% of 21 = 2.1 10% of 23 = 2.3 10% of 26 = 2.6 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0
1 2 3 4 5 6 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 FAUST ARM5 Convergent, Dense Sub-Corpuses: CDSC(w=0%, d=10%, DS0={7,35}) DS0 = {7,35} WS1=Voc(DS0)={words in > 0% of DS0} DS1={docs with > 15% of WS1} WS2=Voc(DS1)={words in >0% of DS1} DS2={docs with > 15% of WS2} ... CDSC(w=0%,d=10%,DS0={7,35} converges to DS0={7,35} WS1={4,7,10,13,17,23,24,25,28,33,34,37,40,42,43,44,47,50}. ED=20/18*2=55.6% So far, ED*DSSizes = 55.6*2=111.2; 39.7*3=119; 24.4*6=146; 6.3*44=277; DSS progression 2,3,6,44; ED*DSS progression 111, 119, 146, 277; DSSs=1,4,42; *8 8,32,336. Subtract from ED*DSS; 111, 111, 114, -59. Using this (highly adjusted and odd) invariant, the 3 sub-corpuses measure out about the same and higher than the MG corpus Note: ED of a single document with its vocabulary is 100%. Lower bound DocCount or at least give DocCount along with the density (or maybe DocCount*EdgeDensity)? It is not yet clear what x%Vocab Document qualification gives us and what convergence under that condition gives us. Would it be best to start by finding large DSs with high ED and work downward using some downward closure condition? 15% of 18= 2.7 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0
1 2 3 4 5 6 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 08JSC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12OWF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 38YLS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08JSC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS2 12OWF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 38YLS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 41OKC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 DS3 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 30HDD 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 41OKC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a g 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a g 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b e d 8 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 b e d 8 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c o c k 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 c o c k 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 l a d y 3 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 l a d y 3 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m e r r y 3 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 m e r r y 3 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 t r e e 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 t r e e 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 w i f e 5 8 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w i f e 5 8 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o m a n 5 9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o m a n 5 9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 f i d d le 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 f i d d le 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m e n 3 8 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 m e n 3 8 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p i g 4 6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 p i g 4 6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 D O C u m e nt 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 50 FAUST ARM 6 Convergent, Dense Sub-Corpuses: CDSC(w=0%, d=15%, DS0={26}) then CDSC(w=0%, d=10%,DS0=26) DS0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 DS1 DS4 Using 10%, it converges to DS4{7,26,28,30,35,39,41,46,50} with a 39 word Vocab and an EdgeDensity of 58/39*9 = 58/351 = 16.5%. So far EdgeDens*DSSize: 55.6*2=111.2 39.7*3=119 24.4*6=146 16.5*9=149 6.3*44=277. DSSizes 2,3,6,9,44 The 4 Deltas from DSS=2 are 1,4,7,42. Multiplied by 8; 8, 32, 56, 336. Subtracting these 8*Delta values from ED*DSS, we get scores of 111, 111, 114, 93, -59. 10% of 7=.7 Vocab(DS1) 10% of 26=2.6 Vocab(DocSet2) Additions: 10% of 43=4.3 Vocab(DocSet3-DocSet2) Additions: 10% of 39=3.9 Vocab(DocSet4) D O C u m e nt 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 50
1 2 3 4 5 6 FAUST ARM 7 Convergent Dense Sub-Corpuses HOB CDSC(HOB) Start with densest doc (35SSS 13 wds). Alternating between WSn=WS(DSn) and DSn+1=DS(WSn), ORing CountSPTS from the high side until RootCount>1. Continue this until stable (either the DS or WS is unchanged. WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS2 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 DS1 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS2 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS1 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 DS2 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 DS1 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 WS1 WS2 WS2 WS1 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 t r e e 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 DS1 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 DS3 Count Vector 0 0 0 1 0 0 2 0 1 2 0 0 1 0 0 0 1 0 0 0 0 0 1 1 2 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 2 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 SPTS 10 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SPTS 00 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 OR from high side until non-singleton SPTS 10 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 RootCount=4! Stop ORing. Convert to list and get those 4 word-pTrees as WS2. Construct the SPTS, CountWS2. OR from high side until non-singleton... With DS1={35SSS}, HOB converges to DS2={7,35,50} WS2={7,10,25,45), ED=8/3*4= 66.7%. Incidently, throw out densest docs/wds gives density, e.g., (DS3={7,35}, WS2, ED=75%), (DS3, {7,10,25}, ED=83.3%), DS3, {7,10,51}, ED=83.3%), (DS4={35,50}, WS2, SD=75%), etc. Next, 07OMH, 50LJH w/o pTree details. DS3 With DS1={07OMH}, the HOB alg converges to Sub-Corpus DS={7,13,35,45} WS={4,7,10,13,42) ED=11/4*5=11/20= 55%. DS3 Starting with DS1={50LJH}, the HOB alg converges to Sub-Corpus DS={35,39,50} WS={9,25,45), ED=7/9= 77.8%. DS = {7 13 35 39 45 50} WS={4 7 9 10 13 25 42 45 47} ED=20/54=37% count 5 3 4 2 2 4 2 2 2 2 3 2 2 3 2 Conclusions: the convergent Sub-corpuses appear to be very dense in general. Theorem: Starting with each doc (from the densest) create all HOB-stable sub-corpuses.Prove this gets all maximal dense sub-corpuses. (doubt if it's true). Maximal means up to downward closures. What downward closure is there? I find: (DS,WS) dense (DS' ,WS') dense, DS'=DS with any subset of sparsest docs removed, same for WS' RootCt=3, Conv to list.Get the 3 doc-pTrees as DS2. Const SPTS, CntDS2. Or from hi bit til non-single WS2 WS1 RootCount=1, singleton DS singleton Same tripleton DS2=DS3. Done b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 cou nt SP TS 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 cou nt Ve ct or 0 0 0 1 0 0 2 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 4 0 0 0 1 0 1 0 0 0 1 0 0 0 2 cou nt SP TS 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 cou nt SP TS 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 cou nt SP TS 0 0 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0 cou nt SP TS 2|1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 cou nt SP TS 3|2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 cou nt SP TS 3|2|1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 cou nt Ve ct or 0 0 0 1 1 0 2 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 13 1 1 1 1 1 1 0 0 0 1 0 1 0 2 OR OR Now consider union of 3 corpuses above DS = {7 13 35 39 45 50} WS={4 7 9 10 13 25 42 45 47} ED = 20/54 = 37%
1 2 3 4 5 6 FAUST ARM 8 CDSC(HOB)2 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 DS1 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12OWF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 15PCD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 38YLS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 DS1 WS1 WS2: 3 42 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a g 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 d35 7 d26 7 d7 7 d39 6 d28 6 d46 6 d21 5 d10 5 d50 5 d13 5 d41 5 d30 4 d37 4 d17 4 d44 4 d1 4 d14 4 d29 4 d47 3 d27 WSC 2 2 4 0 0 0 1 1 1 1 0 0 1 0 2 1 1 1 0 2 0 1 1 0 1 0 0 1 0 2 1 0 1 1 1 2 1 0 1 2 0 4 1 1 2 0 0 0 0 2 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 With DS1={26SBS}, HOB converges to DS={9 27 45} WS={3 42), ED=6/6= 100%. taking all docs and all words, DS={9 12 13 15 26 27 29 35 38 39 45 46} WS={1 2 3 15 30 36 42 60), ED=23/96= 23.9%. DS2: 2 3 9 20 25 30 34 36 37 39 40 45 49 52 DS3: 4 35 39 46 50 DS4: 4 35 39 46 50 With DS1={28BBB}, HOB converges to DS={4 35 39 46 50} WS={2 9 25 45}, ED=12/20= 60%. WS3: 3 42 WS3: 2 9 25 45 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 WS2: 35 39 46 W S C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 1 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w s c 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 6 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 So far we have used 7 9 12 13 15 26 27 29 35 38 39 45 46 50
01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 2 3 4 5 6 FAUST ARM 9 CDSC(HOB)3 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 DS2 01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10JAJ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 14ASO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 17FEC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 21LAU 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30HDD 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 37MBB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 41OKC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 44HLH 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 47CCM 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 d a y 2 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 m e n 3 8 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 DS1 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Use WS1Cbit 3DS={35} Voc(DS) ={7,10,17, 23,25,28,33, 34,37 40, 43,45,50}) ED= 100% DS2=WS1Cbit3|2(=WS1Cbit2)={1,7,10,13,14,17,21,26,28,29, 30,35,37,39,41,44,46,47,50} WS2=vocabDS2=all but 5,22, 29,59 ED=105/(19*56)=10% Instead take WS2=DS2bit2={2,49} ED=9/(19*2)= 24% DS2C: 2 4 1 2 0 1 2 2 3 3 1 2 2 1 1 1 2 3 2 2 2 0 2 2 3 2 2 2 0 2 1 2 1 2 1 1 1 2 2 1 2 2 1 3 3 1 3 1 4 1 2 3 1 2 2 1 2 1 0 2 DS2b2:0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 WS1=CDCbit2={2,3,13,20,22, 25,38,42,44,49,52} CDC 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 CDCb2 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 CDCb1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 CDCb0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 0 DS2=WS1Cb2={46} ED=4/(11*1)= 36.3% WS1=CDCbit2={2,3,13,20,22, 25,38,42,44,49,52} DS2=WS1Cbit 2|1 ={1,4,7,9,11, 17,27,28,29,30,32,41,45,46} ED=14/(14*11)= 9% 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 WS1=Voc(DS1) D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 w s 1 c 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 1 0 1 0 0 0 1 1 0 0 0 1 3 0 0 1 1 1 0 0 0 1 2 0 0 1 0 1 0 0 0 1 2 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 2 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 3 0 0 1 1 2 0 0 1 0 2 0 0 1 0 2 0 0 1 0 2 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 1 1 4 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 WS1Cbit3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1Cbit2 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 1 WS1C 4 2 2 2 3 2 7 3 3 5 3 3 5 4 3 2 4 2 6 2 2 2 7 3 6 4 5 3 3 13 2 4 3 7 5 2 2 4 3 6 4 3 2 5
01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 2 3 4 5 6 FAUST ARM10 Frequent 1DocSets WS1 ={7,10,17,23,25,28, 33,34,37,40,43,45,50} WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 DS2 ={35} DS1 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 W11 3 25 16 30 36 60 D2 26 DS1 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 W1 9 18 30 29 45 55 D2 39 DS1 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 DS1 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 W1 4 7 10 13 24 42 44 DS1 21LAU 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 WS1 ={10 12 14 19 47 54} DS1 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 WS1 ={6 9 20 28 52 60} DS1 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 WS1 ={2 20 25 45 49 51} DS1 10JAJ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={12 19 26 32 44} DS1 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 WS1 ={4 13 47 51 54} DS1 30HDD 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 WS1 ={23 24 27 49} DS1 41OKC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 WS1 ={27 34 39 44 52} DS1 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 WS1 ={9 25 45 47 53} DS1 01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 WS1 ={21 49 52 58} DS1 14ASO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 WS1 ={21 49 52 58} DS1 17FEC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 WS1 ={18 38 49 56} DS1 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={1 2 8 42} DS1 37MBB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 WS1 ={17 41 48 57} DS1 44HLH 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={11 31 32 35} DS1 47CCM 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 WS1 ={8 18 41 57} DS1 05HDS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={26 34 38} DS1 08JSC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 WS1 ={16 25 58} DS1 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={3 35 42} DS1 11OMM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={17 38 44} DS1 12OWF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 WS1 ={15 44 59} DS1 15PCD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 WS1 ={22 31 50} DS1 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={3 20 42} DS1 32JGF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 =22 27 38} DS1 33BFP 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 WS1 =13 29 48} DS1 38YLS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={20 36 40} DS1 48OTB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 WS1 ={37 52 56} DS1 02TLP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 WS1 ={46 57} DS1 03DDD 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 WS1 ={8 51} DS1 04LMM 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 WS1 ={2 25} Of the 2 word docs remaining, , 16 18 22 23 25 36 42 43 49 Only the following are nonsingular 100% dense subcorpuses. D2 ={4 46} ED=4/4=100% WS1 ={4 59} DS1 04LMM 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D2 ={12 25} ED=4/4=100% So there are only 2 nontrivial 100% convervent subcorpuses and both have 2 docs and 2 words only. And. in fact, no convergence steps were required (in each case the sub-corpus converged immediately). F1DocSets={1 7 10 13 14 17 21 26 28 28 30 35 37 39 41 44 46 47 50} To find all freq 2DSs, AND pairwise, then calculate the counts. Easier way?. Frequency and Density are related. Finding ALL frequent DSs is still hard since we have to loop thru all candidate frequent 2DocSets calculating root count of AND. Finding frequent sets is applying our CDSC algorithm once (not applying it until convergent!) and using the full count (100%) instead of a percentage like 15% or 10%. Thus it is CDSC(100%) If we were to take all the wayARM to convergence, we would end up with a DocSet and a WordSet with the property that the DocSet is frequent and the WordSet is frequent. Then we could look for confident DocSet rules AND conf WdSet rules. A confident DocSet rule, AB means: the set of words that occur in every B doc, contains most of the words in the set of words in every A doc. A confident WordSet rule, UV means the set of docs containing every V word, contains most docs in the set of docs that contain every U word.. That's a strong association condition! But it may almost never exist is large corpuses.
M G44d60w: 44 MOTHER GOOSE RHYMES with a synonymized vocabulary of 60 WORDS 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep. The little lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where the woodbines creep. Be always like the lamb so mild, a kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! Av: .05 .11 .09 .07 .05 .05 .07 .07 .07 .07 .05 .05 .09 .05 .05 .05 .07 .07 .05 .09 .05 .09 .05 .07 .11 .07 .07 .05 .05 .05 .05 .05 .05 .07 .05 .05 .05 .14 .05 .05 .05 .11 .05 .14 .07 .05 .07 .07 .09 .05 .07 .11 .05 .07 .05 .05 .07 .05 .05 .05 always away baby back bad bag bake bed boy bread bright brown buy cake child clean cloth cock crown cry cut day dish dog eat fall fiddle full girl green high hill house king lady lamb maid men merry moneymorn mother nose old pie pig plum round run sing son three thumb town tree two way wife woman wool word# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 df# 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2 min=2 word# 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 df# 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 max=6
D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a d 5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 b a g 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b e d 8 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b r i g ht 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 b r o w n 1 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c a k e 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 c o c k 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 c r o w n 1 9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 c u t 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d a y 2 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f a l l 2 6 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f i d d le 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g i r l 2 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 h i g h 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 h i l l 3 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 l a d y 3 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m e n 3 8 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 m e r r y 3 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o r n 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i g 4 6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 r o u n d 4 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t r e e 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 t w 0 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 w a y 5 7 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 w i f e 5 8 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o m a n 5 9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 2 3 4 5 6 WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 01TBM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 02TLP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 03DDD 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 04LMM 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 05HDS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 06SPP 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08JSC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 09HBD 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10JAJ 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11OMM 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12OWF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 14ASO 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 15PCD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 16PPG 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 17FEC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 18HTP 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21LAU 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 22HLH 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 23MTB 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 25WOW 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 26SBS 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 27CBC 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28BBB 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 29LFW 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 30HDD 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 32JGF 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33BFP 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 36LTT 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37MBB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 38YLS 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 41OKC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 42BBC 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 43HHD 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 44HLH 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 46TTP 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 47CCM 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 48OTB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 49WLG 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
HOB2DS1: Go down the HOBs of the countSPTSs one at a time with full vocabulary. Then try for a downward closure on subcorpuses. CDC=CorpusDocCnt> 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 DSC =DocSetCount> 2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 DSCP=DSCPtrees> 2 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 1 1 0 1 19 31 21 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 3 2 1 0 0 1 2 CWC WSCWSCP 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 4 4 0 1 0 0 d1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 2 2 0 0 1 0 d2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d3 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 3 0 0 1 1 d5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d6 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 7 7 0 1 1 1 d7 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 3 3 0 0 1 1 d8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 3 3 0 0 1 1 d9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 1 0 1 d10 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 3 3 0 0 1 1 d11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 3 0 0 1 1 d12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 5 5 0 1 0 1 d13 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 1 0 0 d14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 3 3 0 0 1 1 d15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 2 2 0 0 1 0 d16 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 1 0 0 d17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 2 0 0 1 0 d18 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 6 6 0 1 1 0 d21 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 2 2 0 0 1 0 d22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d23 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 7 0 1 1 1 d26 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 3 3 0 0 1 1 d27 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 6 6 0 1 1 0 d28 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 4 4 0 1 0 0 d29 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 1 0 1 d30 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 3 3 0 0 1 1 d32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 3 0 0 1 1 d33 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 13 13 1 1 0 1 d35 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 2 2 0 0 1 0 d36 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0 1 0 0 d37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 3 3 0 0 1 1 d38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 7 7 0 1 1 1 d39 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 5 5 0 1 0 1 d41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d42 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d43 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 4 4 0 1 0 0 d44 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 0 0 1 1 d45 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 6 6 0 1 1 0 d46 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 4 4 0 1 0 0 d47 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 3 3 0 0 1 1 d48 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 1 0 d49 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 1 0 1 d50 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 13 13 max 180 180sum 3 3 2 2 2 2 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 6 167 3 3 2 2 2 2 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 6 167 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 mx sum 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 0 3 4 5 6 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
Relationships and ARM: In Market Basket Research (MBR), we introduce the relationship, cash-register transactions, T, between customers, C, and purchasable items, I, and briefly discussed what strong rules tell us in that context. In Software Engineering (SE), the relationship between Aspects, T, and Code Modules, I (t is related to i iff module, i, is part of the aspect, t). In Bioinformatics, relationship between experiments, T, and genes, I (t is related to i iff gene, i, expresses at threshold level during experiment, t). In Text Mining, the relationship between Documents, D, and Words, W (w related to d iff wd). A strong D-rule means two things: The DSets A and C have many common words. If a word occurs in every document of the DSet, A, it occurs in every doc of C with high probability. In any Entity Relationship diagram, a “part of” relationship in which iI is part of tT (t is related to i iff i is part of t); and an “ISA” relationship in which iI ISA tT (t is related to i iff i IS A t) . . . Given a Transaction-Item Relationship: vertically processing of a Horizontal Transaction Table (HTT) or horizontally processing of a Vertical Transaction Table (VTT).In 1., a HTT is processed thru vertical scans for all FrequentI-sets (I-sets with support minsupp, e.g., I-sets "frequently" found in transaction market baskets).In 2. a VTT is processed thru horizontal operations to find all FrequentI-setsThen each Frequent I-set found is analyzed to determine if it is the support set of a strong rule. Finding all Frequent I-sets is the hard part. The APRIORI Algorithm takes advantage of the "downward closure" property for Frequent I-sets: If a I-set is frequent, then all its subsets are also frequent.E.g., MBR, If A is an I-subset of B and if all of B is in a basket, the certainly all of A is in that basket too. Therefore Supp(A) Supp(B) whenever AB (downward closure).First, APRIORI scans to determine all Frequent 1-item sets (contain 1 item; therfore called 1-Itemsets),next APRIORI uses downward closure to efficiently find candidates for Frequent 2-Itemsets,next APRIORI scans to determine which of those candidate 2-Itemsets is actually Frequent, ...Until there are no candidates remaining (on the next slide we walk through an example using both a HTT and a VTT) A c1 i1 c2 i2 c3 i3 c4 i4 c5 C C I TID 1 2 3 4 5 ---------------------- 100 1 0 1 1 0 200 0 1 1 0 1 300 1 1 1 0 1 400 0 1 0 0 1 TID Items 100 1 3 4 200 2 3 5 300 1 2 3 5 400 2 5 Horizontal Transaction Table (HTT) Vertical Transaction Table (VTT) 2 1s, 3 2s, 3 3s, 1 4, 3 5s 2 3 3 1 3 1-Iset supports minsupp is set by querier at 1/2, minconf at 3/4 (note minsupp and minconf can be expressed as counts (4 transs, so minsupp=2, minconf=3)) 2 3 3 3 Frequent (supp 2) Start by finding Frequent 1-ItemSets. (downward closure property of "frequent"). Any subset of a frequent itemset is frequent. APRIORI:Iteratively find Frequent k-itemsets, k=1,2,... Find all strong rules supported by each frequent Itemset. (Ck=candidate k-itemsets. Fk=frequent k-itemsets Given any relationship between two entities (e.g., between customers and items) there are always two ARM problems to analyze. E.g., We analyzed Itemset rule, AC (call them I-rules using info recorded on which customer transactions contained those itemsets. With this, we can intelligently shelf items, to accurately order items (Supply Chain Management), and etc. There are also C-rules, The support [ratio] of itemsetA,supp(A), is the fraction of Ts such that A T(I), e.g., if A={i1,i2} and C={i4} then supp(A) = |{t2,t4}| / |{t1,t2,t3,t4,t5}| = 2/5| | means set size= count of elements in set. The support [ratio] of ruleAC, supp(AC),is the support of {A C}=|{T2,T4}|/|{T1,T2,T3,T4,T5}|=2/5 The confidence of ruleAC, conf(AC),is supp(AC) / supp(A) = (2/5) / (2/5) = 1 Data Miners typically want to find all STRONG RULES, AC, with supp(AC) ≥ minsupp and conf(AC) ≥ minconf (minsupp, minconf are threshold levels). A Strong rule indicates two things: high support means it's non-trivial (A and B are found in many market baskets at checkout) and high confidence means that the implication rule is highly likely to be true. Note conf(AC) is also just the conditional probability of t being related to C, given that t is related to A, (e.g., the conditional probability that the market basket contents, T(I), contains C, given that T(I) contains A.
Isupp C1 F1 = L1 C2 F2 = L2 C3 C2 Other ARM methods: FP-Growth: builds a linked data structure precounting counts. Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below threshold cannot be frequent. Trans reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans. Partitioning: Any potentially frequent itemset that in DB must be frequent in at least one of the partitions of DB. Sampling: mining on a subset of given data, lower support threshold + a method to determine completeness. Dynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequent 2 1 1 Scan D Iset F3 = L3 {2 3 5} {1 2 3} {1,3,5} Scan D Scan D P1 2 //\\ 1010 P2 3 //\\ 0111 P1^P2^P3 1 //\\ 0010 Build Ptrees: Scan D P1^P2 1 //\\ 0010 P3 3 //\\ 1110 P1^P3 ^P5 1 //\\ 0010 P1^P3 2 //\\ 1010 P4 1 //\\ 1000 P2^P3 ^P5 2 //\\ 0110 P5 3 //\\ 0111 P1^P5 1 //\\ 0010 L2={13}{23}{25}{35} L1={1}{2}{3}{5} L3={235} P2^P3 2 //\\ 0110 P2^P5 3 //\\ 0111 P3^P5 2 //\\ 0110 Core of Apriori: Use only large (k – 1)-itemsets to generate candidate large k-itemsets Use database scan and pattern matching to collect counts for the candidate itemsets Bottleneck of Apriori: candidate generationHuge candidate sets. 104 large 1-itemset may generate 107 candidate 2-itemset. To discover large pattern of size 100, eg, {a1…a100}, we need to generate 2100 1030 candidates. Multiple scans of database: (Needs (n +1 ) scans, n = length of the longest pattern) A supplemental text document on ARM (with additional topics and discussions) at http://www.cs.ndsu.nodak.edu/~perrizo/classes/785/hk6.html {123} need not be scanned for since {12} is not frequent. {135} need not be scanned for since {15} not frequent HTT Example ARM, uncompressed Ptrees (note: 1-count at Ptree root)
L3 L1 L2 ARM-7 1-ItemSets don’t support Association Rules (They eihter have no antecedent or no consequent). 2-Itemsets do support ARs. Are there any Strong Rules supported by Frequent=Large 2-ItemSets(at minconf=.75)? {1,3} conf({1}{3}) = supp{1,3}/supp{1} = 2/2 = 1 ≥ .75 STRONG conf({3}{1}) = supp{1,3}/supp{3} = 2/3 = .67 < .75 {2,3} conf({2}{3}) = supp{2,3}/supp{2} = 2/3 = .67 < .75 conf({3}{2}) = supp{2,3}/supp{3} = 2/3 = .67 < .75 {2,5} conf({2}{5}) = supp{2,5}/supp{2} = 3/3 = 1 ≥ .75STRONG! conf({5}{2}) = supp{2,5}/supp{5} = 3/3 = 1 ≥ .75STRONG! {3,5} conf({3}{5}) = supp{3,5}/supp{3} = 2/3 = .67 < .75 conf({5}{3}) = supp{3,5}/supp{5} = 2/3 = .67 < .75 Are there any Strong Rules supported by Frequent or Large 3-ItemSets? {2,3,5} conf({2,3}{5}) = supp{2,3,5}/supp{2,3} = 2/2 = 1 ≥ .75STRONG! conf({2,5}{3}) = supp{2,3,5}/supp{2,5} = 2/3 = .67 < .75 No subset antecedent can yield a strong rule either (i.e., no need to check conf({2}{3,5}) or conf({5}{2,3}) since both denominators will be at least as large and therefore, both confidences will be at least as low. conf({3,5}{2}) = supp{2,3,5}/supp{3,5} = 2/2 = 1 .75 STRONG! conf({3}{2,5}) = supp{2,3,5}/supp{3} = 2/3 = .67 < .75 DONE!
1 2 3 4 5 6 HOB-CDSC Start with densest doc (35SSS). Then always choose using highest count (except when doing so results in a singleton, in which case include 2nd high count also). WORD 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 DS0 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS1 07OMH 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13RRS 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 45BBB 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 DS0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 DS1 39LCS 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 50LJH 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 WS1 WS2 WS1 WS2 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 DS0 35SSS 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 DS1 DS2 Starting with 35SSS, the alg converges to Sub-Corpus DS={7,35,50} WS={7,10,25,45) with an EdgeDensity= 8/3*4 = 8/12 = 66.7%. DS2 Starting with 07OMH, the alg converges to Sub-Corpus DS={7,13,35,45} WS={4,7,10,13,42) with EdgeDensity=11/4*5=11/20= 55%. DS2 Starting with 50LJH, the alg converges to Sub-Corpus DS={35,39,50} WS={9,25,45) with EdgeDensity=7/3*3=7/9= 77.8%. WS1 WS2 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0
FAUST Classification1 Using the clustering of FAUST Clustering1, we extract 80% from each cluster as TrainingSet (w class=cluster#). How accurate is FAUST Hull Clustering is on the remaining 20% plus the outliers (which should be "other"). C11={2,3,16,22,42,43} C2 ={1,4,5,8,9,12,14,15,23,25,27,32,33,36,37,38,44,45,47,48} C11={3} C11={2,16,22,42,43} C311= {11,17,29} C312={13,30,50} C313={10,26,28,41} C2 ={4,14,23,45} C2 ={1,5,8,9,12,15,25,27,32,33,36,37,38,44,47,48} Full classes from slide 15 Test Set C311= {11,17} C312={30,50} C313={10,28,41} C311= {29} C312={13} C313={26} Training Set OUTLIERS {18,49} {6} {39} {21} {46} {7} {35} O={18 49 6 39 21 46 7 35} TrainingSe PTreeSet 2 1 1 1 2 2 1 3 1 2 1 2 2 2 3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 1 1 1 0 0 1 1 1 0 1 0 0 0 1 1 1 0 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 sum 0 1 3 0 0 2 1 1 2 0 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 1 1 sumCL 1 2 3 4 5 6 7 8 9 10 11 12 3 1 1 1 1 2 d27 0 0 1 0 0 0 0 0 0 0 0 0 4 0 0 0 0 2 d1 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d38 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d33 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 2 d37 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d9 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 2 d25 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 2 d44 0 0 0 0 0 0 0 0 0 0 1 0 4 0 0 0 0 2 d47 0 0 0 0 0 0 0 1 0 0 0 0 3 1 1 1 1 2 d15 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d5 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d8 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 2 d36 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d12 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d45 0 0 1 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d48 0 0 0 0 0 0 0 0 0 0 0 0 3 1 1 1 1 2 d32 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 11 d2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 11 d42 0 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 11 d43 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 11 d22 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 11 d16 0 0 0 0 0 1 0 0 0 0 0 0 3 1 1 1 1 311 d11 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 311 d17 0 0 0 0 0 0 0 0 0 0 0 0 5 1 1 1 1 312 d30 0 1 0 0 0 0 0 0 0 0 0 0 5 1 1 1 1 312 d50 0 0 0 0 0 0 0 0 1 0 0 0 6 0 0 0 0 313 d28 0 0 0 0 0 1 0 0 1 0 0 0 5 1 1 1 1 313 d41 0 0 0 0 0 0 0 0 0 0 0 0 5 1 1 1 1 313 d10 0 0 0 0 0 0 0 0 0 0 0 1 2 2 1 2 2 1 1 5 1 1 2 3 1 5 1 1 1 3 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 3 1 0 4 2 1 0 2 3 2 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 1 49 50 51 52 53 54 55 56 57 58 59 60 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D=TrainSet MIN MAX CLASS 0.13 0.33 C11 0.33 0.66 C2 0.79 0.79 C311 0.53 0.66 C312 0.73 0.99 C313 D=C2 MIN MAX CLASS 0 0.22 C11 0.44 0.77 C2 0.66 0.66 C311 0.11 0.22 C312 0.44 0.66 C313 D=C311 MIN MAX CLASS 0 0 C11 0 0.66 C2 1.33 1.66 C311 0 0.33 C312 0 0.33 C313 D=C312 MIN MAX CLASS 0 0.31 C11 0 0.31 C2 0 0.31 C311 1.58 1.58 C312 0 0.31 C313 D=C313 MIN MAX CLASS 0 0.22 C11 0 0.44 C2 0 0.44 C311 0.22 0.22 C312 1.34 1.56 C313 D=C11 MIN MAX CLASS 0.63 0.63 C11 0 0.63 C2 0 0 C311 0.31 0.31 C312 0 0.31 C313
WS=WordSet is always defined by dc(DS)>Avg(dc(previousDS)) FAUST Clustering 1.1 Converge using real HOB (high bit only) DS=WordSet is always defined by wc(DS)>Avg(wc(previousDS)) WS0= 2 3 4 7 8 9 10 13 17 18 20 22 24 25 26 27 34 38 42 44 45 47 ----------------------------------------------------------------------- DS1= | WS1 2 4 7 9 10 13 23 24 25 27 34 39 44 45 47 49 50 7 |-------------------------------------------------------- 13 | DS2| 30 | 7 | 35 | 13 | 39 | 30 | 41 | 35 | 46 | 39 | 50 | 41 | | 46 | | 50 | 7. Old Mother Hubbard went to cupboard to give her poor dog bone. When she got there cupboard was bare, poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! Converge using just real HOB (high bit only) Alternating WS0 and DS0. WS0= 2 3 13 20 22 25 38 42 44 49 50 ----------------------------------- DS1= | WS1= 2 20 25 46 49 51 46 |----------------------- | DS2| | 46 | OUTLIER: 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. DS0=|WS1= 7 10 17 23 25 28 33 34 37 40 43 45 50 35 |---| |DS2| |35 | OUTLIER: 35. Sing a song of sixpence, a pocket full of rye. 4 and 20 blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. WS0= 2 3 13 32 38 42 44 52 ---------------------------------- DS1 |WS1= 42(Mother) 7 ------------ 9 DS2|WS2= 42 11 7 27 9 29 27 32 29 41 45 45 C1: Mother theme 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. DS0|WS1 2 9 12 18 19 21 26 27 30 32 38 39 42 44 45 47 49 52 54 55 57 60 1 -------------------------------------------------------------------- 10 DS1| WS2 12 19 26 39 44 13 10 | DS2| WS3 14 10 | DS3 17 10 21 39 26 41 28 44 30 47 37 50 OUTLIER: 10. Jack and Jill went up hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. WS0 22 38 44 52 ------------------- DS1 WS1= 27 38 44 {fiddle(32 41) man(11 32) old(11 44) 11 --------------- 32 DS2 41 11 32 44 C2 fiddle old man theme 11. One misty moisty morning when cloudy was weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do How do you do? How do you do again 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I'd give my fiddle they will think I've gone mad. For many a joyous day my fiddle and I have had 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three.
graphs. Graph with 23 1-vertex clique (vertices 42 2-vertex cliques-edges (6 maximal) 19 3-vertex cliques (light/dark blue) 2 4-vertex cliques (dark blue areas). 11 lightblue triangles=maximal cliques The 2 dark blue 4-cliques are both maximum and maximal, and the clique number of the graph is 4. A clique in an undirected graph is a subset of its vertices such that every 2 vertices in the subset are connected by an edge. Finding a clique of a given size (the clique problem) is NP-complete. The term "clique" comes from Luce & Perry (1949) (complete subgraphs in social networks to model cliques of people (groups of people who know each other). Cliques have apps in bioinformatics. A complete graph is a simpleundirected graph in which every pair of distinct vertices is connected by a unique edge. Simple, means no loop edges and no more than 1 edge between any two different verticies/ (each edge = distinct pair of verticies/) An independent set or stable set is a set of vertices in a graph, no two of which are adjacent. A maximum independent set is an independent set of largest possible size for a given graph G. This size is called the independence number of G, and denoted α(G).[2] The problem of finding such a set is called the maximum independent set problem and is an NP-hardoptimization problem. As such, it is unlikely that there exists an efficient algorithm for finding a maximum indep set of a graph. Every maximum independent set also is maximal, but the converse implication does not necessarily hold. A set is independent if and only if it is a clique in the graph’s complement, so the two concepts are complementary. The complement or inverse of a graph G is a graph H on the same vertices s.t. 2 distinct vertices of H are adjacent iff not adjacent in G A complete bipartite graph is a graph whose vertices can be partitioned into two subsets V1 and V2 such that no edge has both endpoints in the same subset, and every possible edge that could connect vertices in different subsets is part of the graph - a bipartite graph (V1, V2, E) s.t. for every 2 vertices v1 ∈ V1 and v2 ∈ V2, v1v2 is an edge in E. A complete bipartite graph with partitions of size |V1|=m and |V2|=n, is denoted Km,n;[1][2] every 2 graphs with the same notation are isomorphic. A maximal independent set is either an independent set s.t. adding any other vertex to the set forces the set to contain an edge or all vertices of empty graph. A maximum independent set is an independent set of largest possible size for a given graph G. This size is called the independence number of G, and denoted α(G).[2] The problem of finding such a set is called the maximum independent set problem and is an NP-hardoptimization problem. As such, it is unlikely that there exists an efficient algorithm for finding a maximum independent set of a graph. Every maximum independent set also is maximal, but the converse implication does not necessarily hold. A set is independent if and only if it is a clique in the graph’s complement, so the two concepts are complementary. The complement or inverse of a graph G is a graph H on the same vertices such that two distinct vertices of H are adjacent if and only if they are not adjacent in G A complete bipartite graph is a graph whose vertices can be partitioned into two subsets V1 and V2 such that no edge has both endpoints in the same subset, and every possible edge that could connect vertices in different subsets is part of the graph - a bipartite graph (V1, V2, E) s.t. for every 2 vertices v1 ∈ V1 and v2 ∈ V2, v1v2 is an edge in E. A complete bipartite graph with partitions of size |V1|=m and |V2|=n, is denoted Km,n;[1][2] every 2 graphs with the same notation are isomorphic. Every tree is bipartite. Cycle graphs with an even number of vertices are bipartite. Every planar graph whose faces all have even length is bipartite. Special cases of this are grid graphs and squaregraphs, in which every inner face consists of 4 edges and every inner vertex has four or more neighbors.[9] The complete bipartite graph on m and n vertices, denoted by Kn,m is the bipartite graph G = (U, V, E), where U and V are disjoint sets of size m and n, respectively, and E connects every vertex in U with all vertices in V. It follows that Km,n has mn edges.[10] Closely related to the complete bipartite graphs are the crown graphs, formed from complete bipartite graphs by removing the edges of a perfect matching. Hypercube graphs, partial cubes, and median graphs are bipartite. In these graphs, vertices may be labeled by bitvectors, in such a way that 2 vertices are adjacent iff the corresponding bitvectors differ in a single position. A bipartition may be formed by separating the vertices whose bitvectors have an even number of ones from the vertices with an odd number of ones. Trees and squaregraphs form examples of median graphs, and every median graph is a partial cube.[12] A graph is bipartite iff it does not contain an odd cycle. A graph is bipartite if and only if it is 2-colorable, (i.e. its chromatic number is less than or equal to 2). The biadjacency matrix of a bipartite graph is a -matrix of size that has a one for each pair of adjacent vertices and a zero for nonadjacent vertices.[20] Biadjacency matrices may be used to describe equivalences between bipartite graphs, hypergraphs, and directed graphs. Every tree is bipartite. Cycle graphs with an even number of vertices are bipartite. Every planar graph whose faces all have even length is bipartite. Special cases of this are grid graphs and squaregraphs, in which every inner face consists of 4 edges and every inner vertex has four or more neighbors.[9] The complete bipartite graph on m and n vertices, denoted by Kn,m is the bipartite graph G = (U, V, E), where U and V are disjoint sets of size m and n, respectively, and E connects every vertex in U with all vertices in V. It follows that Km,n has mn edges.[10] Closely related to the complete bipartite graphs are the crown graphs, formed from complete bipartite graphs by removing the edges of a perfect matching.[11]Hypercube graphs, partial cubes, and median graphs are bipartite. In these graphs, vertices may be labeled by bitvectors, in such a way that 2 vertices are adjacent iff the corresponding bitvectors differ in a single position. A bipartition may be formed by separating the vertices whose bitvectors have an even number of ones from the vertices with an odd number of ones. Trees and squaregraphs form examples of median graphs, and every median graph is a partial cube.[12] A graph is bipartite iff it does not contain an odd cycle. A graph is bipartite if and only if it is 2-colorable, (i.e. its chromatic number is less than or equal to 2). The biadjacency matrix of a bipartite graph is a -matrix of size that has a one for each pair of adjacent vertices and a zero for nonadjacent vertices.[20] Biadjacency matrices may be used to describe equivalences between bipartite graphs, hypergraphs, and directed graphs.