70 likes | 80 Views
Check out the ebook on FPGAs and DP. 2points about the topic: 1. Thinking about FPGA DM together with the raging debate about the efficacy of Non-SQL, key-value, Hadoop… DP (google: DeWitt, Hadoop), the following success path for pTrees leaps to mind:
E N D
Check out the ebook on FPGAs and DP. 2points about the topic: 1. Thinking about FPGA DM together with the raging debate about the efficacy of Non-SQL, key-value, Hadoop… DP (google: DeWitt, Hadoop), the following success path for pTrees leaps to mind: Hadoop et al, stores and catalogues Big Data across massive clusters of computers. SQL DBMSs can’t manage data that big. But what the SQL people [DeWitt-Stonebraker et al] claim is that non-SQL leaves you high and dry in terms of processing that Big Data. The Hadoop response: “Yes, but we never intended otherwise. And BTW, you guys can’t even store and catalog big data! And if you don’t have it, YOU aren’t processing it (and neither is anyone else!) ”. Is this an opening for pTrees? YES! Assume a Hadoop, big data, distributed TrainingSet (identified as the Hadoop data for a particular high-value Prediction/Classification task). Assume we have conversion routines to convert it to a single PTS (PTreeSet) (possibly multilevel, compressed). Now we've a PTS for Classifctn, we use FPGA impls of Md’s SPTSA (ScalarPTreeSetAlgebra) and FAUST algorithms (incl. SVD) to [nearly instantaneously] produce answers. Note that each PTS is really the concatenation of many SPTSs, so Md’s SPTSA and FAUST apply immediately once we have the PTS. How many SPTSs does a PTS of a distributed Hadoop TrainingSet produce? Exactly as many as the dimension of the Hadoop TrainingSet. pTree Unbelievers (pTUs) will say: Sure, but let’s see those “conversion routines” that convert the Hadoop, big data, distributed TrainingSet [needed for a particular high-value Classification task] to a PTS." Our answer: pTrees don't perform magic! pTrees are the best tool for only certain tasks. If the customer can’t give us a well defined description of the Training Table for a classification task (i.e., says “Here’s all our Hadoop data. Mine it!”), we should just walk away (as should everyone), because the customer is asking for Mining Magic (MM). And the only bonafide MM is the candy. We'll admit, however, that “Defining the classification task and producing the TrainingTable for it IS HARD! Looking for the low hanging fruit: For which Big Data classification can we identify/convert a TrainingSet to a PTS? It only takes 1 big winner! Don’t be everything to everybody. A strategy? Suppose we come upon a Big Data Classification task that the non-SQL people are already doing for big money. That means they are identifying the pieces in Hadoop which make up the TrainingTable and then they are using a [pathetic] combination of distributed map-reduce routines to mine it, one TrainingTable virtual record at a time. We ought to be able to take use their “map” code (to identify the Training Data), strip from their “reduce” to “gather raw data” into a TT, replace their VPHD code with “produce a PTS” conversion. Then we FAUST it. 2. I’m curious what Dr. Wettstein’s reaction of the opening in the abstract regarding the cause of the industries move to parallel programming. It seems the author plays down the reaching of the miniaturization limit (paramagnetic limit) and focuses on power consumption and heat dissipation issues as causes. Of course, I suppose, if further breakthroughs in miniaturization had occurred (more elements per die) those breakthroughs would have expressed themselves also as power/heat breakthroughs? So it’s like watching a sports game, what you see depends upon where your seat is. Students: pay attention to anything Dr. Wettstein has time to share on this! He was in direct contact with Intel Research during this time so his “seat” was right behind home plate, while this author’s may have been in the cheap seats. Synthesis Lectures on Data Management edited by M. Tamer Ozsu of the University of Waterloo Now Available: Data Processing on FPGAs by Jens Teubner (Technical University of Dortmund, Germany) and Louis Woods (ETH Zurich)Download Free Sample
After thinking and discussing it with several people, I think we should code in C++ even tho C# got the node last meeting ;-) The reason is that C++ is a research language (while C# and Java may be the best industry production languages). I have had this experience before and I gave in. The result was (in the case of the previous large development project, SMILEY) a system that no one used after the research evolved a bit. There are all sorts of reason, which we can discuss next week. but also, we have a fantastic resource in Greg Wettstein (and Bryan Mesich) and a foundation of Recommender code in C++ which is absolutely first rate. We should use it! Once the research stabilizes, we can do a production version in C#. A case in point is the SVD recommender - we don't know how it will end up (I don't understand Funk's algs yet. Anybody?). The bottom line is speed in Horizontal Processing Vertical Data (HPVD), we need to be able to control everything including memory allocation/deallocation, I/O mgmt (including cache), AND/OR/COMP level-0 and level-1 algorithm coding, HOBBIT style (bit-slice) shortcutting in Md's ScalarPTreeSet Algebra - just to name a few. Here's a first test of that assertion:Can you define a Level-1 (or even level-0) PTreeSet in C# which takes only 1 memory bit per data bit without compromising bit level processing speed? Maybe you can, but in SMILEY, we ended up using a byte per bit because it was easier coding. Here's a second test: Can you code Dr. Wettstein's Logical Processing and 1-bit-counting speed enhancements (found in the Netflix code) in C#. I guess what I'm trying to get across is that coding speed is not the issue - execution speed is. If we don't get maximal execution speed we've got nothing! It isn't that we have a lot of completely new basic algorithms (we do have some of course) it's that we have an approach (HPVD at the bitslice level, possibly compressed to a tree) which facilitates orders of magnitude speedup of most DM algs. With orders of magnitude speedup, we can implement algorithms which will run in acceptable time that used to take too long to be useable (e.g., training up an SVD classifier with 1,000,000 features). I believe, to do that, we need HPVD (pTrees). And I believe to do pTrees right we need to use C++. Prove me wrong by training a 1M feature SVD using C#?). I actually hope I'm wrong ;-)
Line Search Details Delta mse -0.05 5.571428571 4.787528571 3.956937381 3.124580148 2.370216629 1.827544757 1.712786894 0.001 1.694534417 1.683969759 1.683990218 0.001 1.683990218 1.683817009 1.683713544 1.683680240 a b c d e f g h i j k l m n ... LRATE MSE a b c d e f g h i j k l m n ... LRATE MSE a b c d e f g h i j k l m n ... LRATE MSE Line Search Details Delta mse -0.1 2.742857142 1.757028571 1.726552903 -0.001 1.725127451 1.722583169 1.720124307 1.717750375 1.715460883 1.713255343 0.0001 1.712786894 0.0001 1.683990218 0.0001 1.683680240 0.0001 2.742857142 0.0001 1.726552903 0.0001 1.725127451 0.0001 486.9428571 0.0001 1.684954260 0.0001 1.683737218 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.75 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 ... 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 1.78 ... 5 5 5 5 5 5 5 5 5 5 5 5 5 5 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 1.72 ... Line Search Details Delta mse -0.1 486.9428571 300.2339285 182.0785776 108.0757974 62.37291917 34.69604087 18.40735293 9.235135416 4.444223928 2.294761042 1.689520299 1.684954260 0.001 1.683737218 1.683737131 Using the new dataset with 20 movies and 51 users. Initial: 1 1 1 1 .... 1 Line Search Details Delta mse -0.1 5.8 2.848214285 1.780423107 1.744331269 -0.001 1.741238800 1.738234884 1.735319015 1.732490689 1.729749404 1.727094662 5 5 5 5 5 5 5 0.0001 5.8 4.00 4.00 4.00 4.00 4.00 4.00 4.00 0.0001 1.744331269 3.98 3.98 3.98 3.98 3.98 3.98 3.98 0.0001 1.727094662 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.80 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.79 Note: 3.0625 = 1.75*1.75 3.1442 = 0.79*3.98
a b c d e f g h i j k l m n o p q r s t 1 2 3 4 5 6 7 8 LRATEMSE .52 1.47 .52 1 1.47 1 1.31 .68 1 .52 1.47 1 1.15 1.63 .52 1 .84 1.15 1.63 1.15 3.04 3.02 3.06 3.07 2.95 3.00 3.05 3.04 .05250.4470620 We came up with an approximately minimized mse at LRATE=,030. Going from this line search resulting from LRATE=.03, we do another round round: .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .0300.368960 Going from this line search resulting from LRATE=.02, we the same for the next round: .75 1.47 .50 .99 1.47 .98 1.44 .66 .99 0.51 1.47 .99 1.21 1.63 .50 0.98 .76 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 2.99 3.04 3.03 .0200.351217 Here is the result after 1 round when using a fixed increment line search to find minimize mse with respect to the LRATE used: Without line search, using Funk's LRATE=.001, to arrive at ~ same mse (and a nearly identical feature vector) it takes 81 rounds: .76 1.38 .61 .99 1.38 .99 1.34 .74 0.99 .61 1.38 .99 1.16 1.50 .61 .99 .82 1.12 1.50 1.13 3.04 3.01 3.04 3.06 2.92 2.98 3.02 3 .001 0.44721854 Going from the round 1 result (LRATE=.0525) shown here, we do a second round and again do fixed increment line search: .52 1.47 .52 1 1.47 1 1.31 .68 1 .52 1.47 1 1.15 1.63 .52 1 .84 1.15 1.63 1.15 3.04 3.02 3.06 3.07 2.95 3.00 3.05 3.04 .05250.447062 .92 1.48 .50 .99 1.47 .98 1.46 .66 .98 0.50 1.47 .98 1.22 1.63 .50 0.98 .75 1.15 1.63 1.17 3.07 3.03 3.07 3.09 2.92 2.99 3.04 3.03 .0500.387166 .84 1.47 .50 .99 1.47 .99 1.43 .66 .99 0.50 1.47 .99 1.21 1.63 .50 0.98 .77 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .0400.371007 .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .0300.368960 .76 1.47 .51 .99 1.47 .99 1.40 .67 .99 0.51 1.47 .99 1.19 1.63 .51 0.99 .79 1.15 1.63 1.16 3.06 3.03 3.07 3.08 2.93 3.00 3.04 3.03 .0200.380975 .75 1.47 .50 .99 1.47 .98 1.44 .66 .99 0.51 1.47 .99 1.21 1.63 .50 0.98 .76 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 2.99 3.04 3.03 .0200.351217 .75 1.47 .50 .99 1.47 .99 1.42 .66 .99 0.51 1.47 .99 1.20 1.63 .50 0.99 .77 1.15 1.63 1.17 3.06 3.03 3.07 3.08 2.92 3.00 3.04 3.03 .0100.362428 .74 1.47 .50 .99 1.47 .98 1.46 .66 .98 0.50 1.47 .98 1.22 1.63 .50 0.98 .75 1.15 1.63 1.17 3.07 3.04 3.07 3.09 2.91 2.99 3.04 3.02 .0100.351899 LRATE=.02 stable, near-optimal? (No further line search). After 200 rounds at LRATE=.02. (note that it took ~2000 rounds without line search and with line search ~219): .83 1.39 .48 .91 1.40 .86 1.52 .61 .98 0.60 1.51 .98 1.15 1.74 .48 0.99 .64 1.10 1.62 1.45 3.28 3.28 3.04 3.48 1.69 2.98 3.12 2.65 .0200.199358 Comparing this feature vector to the one we got with ~2000 rounds at LRATE=.001 (without line search) we see that we arrive at a very different feature vector: a b c d e f g h i j k l m n o p q r s t 1 2 3 4 5 6 7 8 LRATE 1.48 2.54 .90 1.6 2.6 1.55 2.68 1.15 1.73 1.08 2.67 1.73 2.06 3.08 .90 1.76 1.16 2.07 2.90 2.75 1.86 1.74 1.71 1.94 .87 1.61 1.7 1.5 .001, no ls .83 1.39 .48 .91 1.40 .86 1.52 .61 .98 .60 1.51 .98 1.15 1.74 .48 0.99 .64 1.10 1.62 1.45 3.28 3.28 3.04 3.48 1.69 2.98 3.12 2.65 .020, w ls However, the UserFeatureVector protions differ by constant multiplier and the MovieFeatureVector portions differ by a different constant. If we divide the LR=.001 vector by the LR=.020, we get the following multiplier vector (one is not a dialation of the other but if we split user portion from the movie portion, they are!!! What does that mean!?!?!?! 1.77 1.81 1.85 1.75 1.84 1.79 1.76 1.86 1.75 1.78 1.76 1.75 1.79 1.76 1.85 1.76 1.81 1.86 1.78 1.89 .56 .53 .56 .55 .51 .54 .54 .56".001/.020" 1.80 avg 0.04 std 0.54 avg 0.01 std Another interesting observation is that 1 / 1.8 = .55, that is, 1 / AVGufv = AVGmfv. They are reciporicals of oneanother!!! This makes sense since it means, if you double the ufv you have to halve the mfv to get the same predictions. The bottom line is that the predictions are the same! What is the nature of the set of vectors that [nearly] minimize the mse? It is not a subspace (not closed under scalar multiplication) but it is clearly closed under "reciporical scalar multiplication" (multiplying the mfv's by the reciporical of the ufv's multiplier). Waht else can we say about it? So, we get an order of magnitude speedup fromline search. It may be more than that since we may be able to do all the LRATE calculations in parallel (without recalculating the error matrix or feature vectors????). Or we there may be a better search mechanism than fixed increment search. A binary type search? Othere?
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 1 \a=Z2 2 3 3 5 2 5 3 3 /rvnfv~fv~{goto}L~{edit}+.005~/XImse<omse-.00001~/xg\a~ 3 2 5 1 2 3 5 3 .001~{goto}se~/rvfv~{end}{down}{down}~ 4 3 3 3 5 5 2 /xg\a~ 5 5 3 4 3 6 2 1 2 1 7 4 1 1 4 3 8 4 3 2 5 3 9 1 4 5 3 2 LRATE omse 10 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3 1 3 3 2 0.001 0.1952090 fv A22: +A2-A$10*$U2 /* error for u=a, m=1 */ A30: +A10+$L*(A$22*$U$2+A$24*$U$4+A$26*$U$6+A$29*$U$9) /* updates f(u=a) */ U29: +U9+$L*(($A29*$A$30+$K29*$K$30+$N29*$N$30+$P29*$P$30)/4)/* updates f(m=8 */ AB30: +U29 /* copies f(m=8) feature update in the new feature vector, nfv */ W22: @COUNT(A22..T22) /* counts the number of actual ratings (users) for m=1 */ X22: [W3] @SUM(W22..W29) /*adds ratings counts for all 8 movies = training count*/ AD30: [W9] @SUM(SE)/X22 /* averages se's giving the mse */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 21 working error and new feature vector (nfv) 22 0 0 0 **0 ** 3 6 35 23 0 0 ** 0 ** 0 3 6 24 0 0 0 ** 0 2 5 25 0 ** ** 3 3 26 0 0 **1 3 27 **** ** 0 3 4 28 ** 1 0 ** 3 4 29 ** ** 0 0 2 4 L mse 30 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.1952063 nfv A52: +A22^2 /*squares all the individual erros */ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAAB AC AD 52 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 square errors 53 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 58 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 59 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SE 60 --------------------------------------------------------------- 61 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 0 0 1 1 1 3 3 3 3. 2 2 3 2 0.125 0.225073 62 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 2 3 2 0.141 0.200424 63 0 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 0 1 1 1 3 3 3 3. 1 3 3 2 0.151 0.197564 64 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.196165 65 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.151 0.195222 66 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195232 67 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195228 68 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195224 69 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195221 70 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195218 71 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195214 72 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 1 1 3 3 2 3. 1 3 3 2 0.001 0.195211 {goto}se~/rvfv~{end}{down}{down}~ "value copy" fv to output list Notes: In 2 rounds mse is as low as Funk gets it in 2000 rounds. After 5 rounds mse is lower than ever before (and appears to be bottoming out). I know I shouldn't hardcode parameters! Experiments should be done to optimize this line search (e.g., with some binary search for a low mse). Since we have the resulting individual square_errors for each training pair, we could run this, then for mask the pairs with se(u,m) > Threshold. Then do it again after masking out those that have already achieved a low se. But what do I do with the two resulting feature vectors? Do I treat it like a two feature SVD or do I use some linear combo of the resulting predictions of the two (or it could be more than two)? We need to test out which works best (or other modifications) on Netflix data. Maybe on those test pairs for which the training row and column have some high errors, we apply the second feature vector instead of the first? Maybe we invoke CkNN for test pairs in this case (or use all 3 and a linear combo?) This is powerful! We need to optimize the calculations using pTrees!!! /rvnfv~fvcopies fv to nfv after converting fv to values. {goto}L~{edit}+.005~increments L by .005 /XImse<omse-.00001~/xg\a~IF mse still decreasing, recalc mse with new L .001~ Reset L=.001 for next round /xg\a~ Start over with next round
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z AAABACADAEAFAGAHAIAJ 1 3 4 3 1 3 3 3 2 3 2 2 5 3 3 3 3 5 1 4 4 4 1 1 3 4 5 5 2 3 3 2 6 3 1 3 4 1 7 3 2 1 3 1 1 3 4 8 4 5 5 9 2 2 2 3 3 1 10 3 1 1 2 3 2 11 4 4 2 3 5 3 3 12 3 13 5 3 3 5 3 3 3 1 14 2 3 3 2 5 15 3 3 2 16 1 1 5 3 1 5 1 3 17 3 2 3 1 2 2 18 2 3 3 1 3 3 19 1 3 4 2 4 3 3 1 4 1 20 1 2 5 3 1 4 4 2 3 AKALAMANAOAPAQARASATAUAVAWAXAY BA 5 2 5 3 1 1 2 3 5 1 3 3 5 5 1 3 4 1 1 2 1 1 4 1 3 4 2 5 1 3 5 1 1 5 1 5 5 5 2 1 1 3 3 2 5 1 4 4 1 1 3 1 5 2 1 4 4 3 5 1 5 1 3 2 5 1 4 1 4 1 5 2 4 5 3 4 1 O P Q R S T U V W X Y Z AA AB 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.89 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.88 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.88 0.66 0.97 0.72 0.89 0.97 0.97 0.97 0.56 0.72 0.53 0.80 1.62 0.97 0.88 0.66 0.99 0.98 0.99 0.99 0.99 0.99 0.97 0.98 0.98 0.99 1.01 0.99 0.99 0.99 0.99 0.77 0.92 0.99 0.99 0.99 0.61 0.77 0.62 0.84 1.45 0.99 0.92 0.85 BB BC BD BE BF BG BH BI BJ BK BL BM BN BO 1.68 1.68 1.69 1.67 1.68 1.68 1.68 1.68 1.68 1.67 1.68 1.67 1.70 1.68 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.69 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 1.71 2.81 2.80 2.83 2.80 2.81 2.77 2.81 2.80 2.81 2.80 2.81 2.80 2.83 2.82 2.84 2.84 2.84 2.84 2.84 2.83 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 2.84 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.00 3.01 3.01 3.02 3.01 3.01 3.01 3.01 3.01 3.01 3.01 3.01 3.01 3.02 3.01 AC AD AE AF AG AH AI AJ AK AL AM 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.97 0.64 0.56 0.97 0.97 0.86 0.97 1.13 1.07 1.59 1.13 0.99 0.99 0.97 0.99 0.99 0.99 0.99 1.00 1.00 1.00 1.01 0.99 0.76 0.61 0.99 0.99 0.90 0.99 1.14 1.08 1.28 1.15 BP BQ BR BS BT Lrate MSE 3.09 3.09 3.09 3.09 3.09 0.0079 1.252787373 3.09 3.09 3.09 3.09 3.09 0.0001 1.252778817 3.09 3.09 3.09 3.09 3.09 0.0001 1.252777738 3.09 3.09 3.09 3.09 3.09 0.0001 1.252777438 3.09 3.09 3.09 3.09 3.09 0.0001 1.252777289 3.09 3.09 3.09 3.09 3.09 0.0001 1.252777139 3.09 3.09 3.09 3.09 3.09 0.0001 1.252776991 3.09 3.09 3.09 3.09 3.09 0.0001 1.252776843 3.09 3.09 3.09 3.09 3.09 0.0001 1.252776695 3.09 3.09 3.09 3.09 3.09 0.0001 1.252776548 3.00 3.00 3.00 3.00 0.0005 1.749577428 3.01 3.02 3.01 3.01 0.0035 1.278489789 A B C D E F G H I J K L M N 102 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 103 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 104 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 105 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 106 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 107 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 108 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 109 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 110 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 111 0.76 0.97 0.97 0.75 0.72 1.29 0.88 0.86 0.97 1.18 0.86 0.72 0.97 1.29 0.97 0.99 0.99 0.99 0.98 1.00 0.99 0.99 0.99 1.00 0.99 0.98 0.99 1.01 0.78 0.99 0.99 0.81 0.77 1.22 0.92 0.90 0.99 1.18 0.90 0.77 0.99 1.27 AN AO AP AQ AR AS AT AU AV AW AX AY AZ BA 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.97 0.48 0.97 1.29 0.80 1.07 0.80 1.29 0.97 0.89 1.53 1.42 3.09 0.99 0.98 0.99 1.00 0.99 1.00 0.99 1.01 0.99 0.99 1.03 1.03 3.00 3.00 0.99 0.65 0.99 1.22 0.84 1.08 0.84 1.29 0.99 0.92 1.52 1.43 3.02 3.01 A larger example: 20 movies, 51 users (same as last time except I found errors in my code, which I corrected. 2 2 2 1 0 1 2 1 2 1 2 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 21 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 1 1 1 2 The last two red lines are printouts of the two steps in the initial line search (on the way to the first result line at MSE=1.252787373). The two vectors should be co-linear (generate the same line) or else I am not doing line search!! They are clearly not co-linear. Thus I have a more code mistake. This is why a C# versions is desparately needed!! How is that coming?
Where are we now wrt PSVD? Clearly line search is a good idea. How good? (speedup?, accuracy comparisons?) What about 2nd [3rd?, 4th?, ...] feature vector training? How to generate those? (Probably just a matter of understanding Funk's code). What "retraining under mask" steps are breakthroughs? improve accuracy markedly? improve speed markedly? What speedup shortcuts can we [as mindless engineers ;-) ] come up with. By "mindless" I mean only that trial and error is probably the best way to find these speedups, unless you can understand the mathematics). Maybe Dr. Ubhaya? What speedup shortcuts can we come up with to execute Md's PTreeSet Algebra Procedures? These speedups can be "mindless" or "magic" - we'll take them anyway!. Again, by "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" What speedup shortcuts can we come up with? Involving Md's PTreeSet Algebra? These speedups can be "mindless" or "magic", we'll take them anyway!. By "mindless" I mean that trial and error is used to find lucky speedups - unless you can fully understand the mathematics, it's mindless ;-) Maybe Dr. Ubhaya can do the math for us? I will suggest the following: "The more the Mathematics is understood the better the mindless engineering tricks work!" In RECOMMENDERs, we have people (users, customers, websearchers...) and things (products, movies, items, documents, webpages or?) We also often have text (product descriptions, movie features, item descriptions, document contents, webpage contents...), which can be handled as entity description columns or by introducing a third entity, terms (content terms, stems of content terms, ...). So we have three entities and three relationships in a cyclic 2 hop rolodex structure (or what we called BUP "Bi-partite, Uni-partite on Part" structure). A lifetime of fruitful research lurks in this arena. We can use one relationship to restrict (mask entities instances in) an adjacent relationship. I firmly believe pTree structuring is the way to do this. We can add a people-to-people relationship also (ala, facebook friends) and richen the information content significantly. We should add tweats to this somehow. Since I don't tweat, I'm probably not the one to suggest how this should fit in, but I will anyway ;-) Tweats (seem to be) mini-documents describing documents or mini-documents describing people, or possibly even mini-documents describing terms (e.g, if a buzzword becomes hot in the media, people tweat about it????) Let's call this research arena the VERTICAL RECOMMENDER arena. It's hot! Who's going to be the Master Chef in this Hell's Kitchen?