1 / 14

Type-0 P-trees

Example: dimension =1 fanout =2 dimensoin =2 depth=5 NumberOfPotentialLeaves= PL =32=fanout depth. 101 . . 1. 100 . . 1. 101 . . 1. 001 . . 1. 101 . . 1. 111 . . 1. 100 . . 1. 101 . . 1. 100 . . 1. 101 . . 1. 100 . . 1. 101 . . 1. 100 .

Download Presentation

Type-0 P-trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Example:dimension=1 fanout=2dimensoin=2 depth=5 NumberOfPotentialLeaves=PL=32=fanoutdepth 101 . . 1 100 . . 1 101 . . 1 001 . . 1 101 . . 1 111 . . 1 100 . . 1 101 . . 1 100 . . 1 101 . . 1 100 . . 1 101 . . 1 100 . . 1 101 . . 1 101 . . 1 100 . . 0 101 . . 0 101 . . 1 100 . . 1 0 1 2 3 8 9 11 12 13 14 15 20 21 22 23 28 29 30 31 TypeBit: 0=pure0 segments omitted (pure1s switched on) Purity Array (PA) Physical Structure: Leaf Exists Array or LA 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Physical Structure: Leaf Exists Map or LM(size=NOPL=32) Leaf Map (LM) 0 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 0 0 1 . . 1 0 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 0 1 0 0 . . 0 1 0 1 . . 0 1 0 1 . . 0 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Purity Map (PM) Compress by only having a purity bit when LBM=1? (i.e., becomes = PuritySwitch map) Type-0 P-trees Note: Hopefully, NPL << NumberOfActualLeaves=AL The only leaves that get stored are the "impure" leaves (not pure0 and not pure1). Note: The leaves are bit vectors, so they can be compressed the same way (i.e., the physical structure can be nested)

  2. Purity Switch TypeBit: 0=pure0 segments omitted (pure1s switched on) Physical Structure: Leaf Exists Array or LA 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 3 3 8 8 9 9 11 11 12 12 13 13 14 14 15 15 20 20 21 21 22 22 23 23 28 28 29 29 30 30 31 31 TypeBit: 1=pure1 segments omitted (pure0s switched on) Purity Array (PA) Physical Structure: Leaf Exists Array or LA 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Physical Structure: Leaf Exists Map or LM(size=NOPL=32) Leaf Map (LM) 1 1 1 1 1 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 0 0 1 . . 1 0 0 1 . . 1 0 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 0 1 0 0 . . 0 1 0 0 . . 0 1 0 1 . . 0 1 0 1 . . 0 1 0 1 . . 0 1 0 1 . . 1 1 0 1 . . 1 1 0 1 . . 1 1 0 0 . . 1 1 0 0 . . 1 1 0 0 . . 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Purity Map (PM) Type-1 P-trees Given the Type-0 Ptree above, the same Ptree can be expressed as a Type-1 Ptree (Using either a Leaf Existence Array (LA) with Purity Switches in each array component (PA) or a Leaf Existence Map (LM) and a Purity Map (PM)

  3. How should Ptree be stored? impure leaves 0p1 101000101000011 # bits = leaf length 0p2 1p3 0p4 1p5 0p6 1 1 1 1 Leaf Map1 1 1 Leaf Map2 1 1 1 Leaf Map3 1 1 Leaf Map4 1 1 Leaf Map5 1 1 Leaf Map6 1 Pure Map1 1 Pure Map2 Pure Map3 1 Pure Map4 Pure Map5 1 1 Pure Map6 Type bit The tempting way to store these structures is to cluster by Ptree (horizontally across the rows of the cube above). But since those leaves never get ANDed with oneanother, wouldn't it be better to cluster by Leaf Offset? ( vertically down the cube above) since these are precisely the bit vectors that get ANDed together. If there is good compression (not too many impure leaves per Ptree), then storing each Leaf Offset Collection (vertical slice of the cube) on a page (or extent), would mean that only that page would need to be brought in when an actual AND is called for (and prefetching is straight forward). The collection of LMs and PMs could be stored separately? (since they are processed separately from the impure leaves) on one additional extent.

  4. 011 1 . 1 011 . . 0 000 . . 0 111 . . 0 000 . . 0 010 . . 0 1 0 6 1 7 16 17 1 18 19 011 1 . 1 011 . . 0 111 . . 0 010 . . 0 6 1 7 16 17 1 18 19 100 . . 0 100 . . 1 111 . . 1 000 . . 1 111 . . 1 101 . . 1 000 . . 1 100 . . 1 101 . . 1 100 . . 0 P-tree operation: COMPLEMENT <-- COMPLEMENTing a P-tree --> Flip the Type Bit and complement the Leaves ( That's all!) (if the structure is nested, complementing a leaf means flipping its TypeBit and complementing its leaves)

  5. 101000000000011 101000101000011 101000111110011 101000101000000 111000000000011 111100101000011 000000000011011 0p1 0p2 1p3 0p4 000011000000001 000011000010101 1p5 0p6 1 1 1 1 0-Leaf Map1 1 1 0-Leaf Map2 1 1 1 1-Leaf Map3 1 1 0-Leaf Map4 1 1 1-Leaf Map5 1 1 0-Leaf Map6 1 0Pure Map1 1 0Pure Map2 2. pos=1, PM3(1)=0 so fetch & AND p1 101000101000011 p3 111100101000011 1Pure Map3 p6 000011000010101 res 000000000000001 1 0Pure Map4 1Pure Map5 1 1 0Pure Map6 P-tree operation: AND 0 1 2 3 4 5 6 7 8 9 positions Assume impure leaves are clustered by Leaf Offset? ( vertically down the cube), and the collection of LMs (and PMs) are stored separately on one additional extent. 1. AND all 0-LMs --> A 2. scan l-to-r across A for next 1 bit, if that position in any 1-PM=1, then GOTO 2 else fetch & AND nonpure leaves --> B; GOTO 2 3. A forms the LM of the result and the Bs are the nonpure leaves. E.g., p1 ^ p3 ^ p6 1. 1001100001^ 1000001001^ 1000010000= 1000000000 3. Result Ptree: 0-Leaf_Map: 1000000000 0-Pure_Map: 0000000000 impure leaves:000000000000001 root-count =1 In ASM is there an operation, AND and COUNT? It seem like counting the 1-bits of the result as the AND result bits are produced, would save a massive scan to get the root count.

  6. Vertical Data Assistant (VDA) A VDA is a windows (or windows CE) application that can data mine massive datasets efficiently? (note that the a separate application can be built to convert an store properly.

  7. A file, R(A1..An), contains horizontal structures (a set of horizontal records) R( A1 A2 A3 A4) R( A1 A2 A3 A4) R[A1] R[A2] R[A3] R[A4] 010 111 110 001 010 111 110 000 010 110 101 001 010 111 101 111 100 010 001 100 010 010 001 101 111 000 001 100 111 000 001 100 2 7 6 1 2 7 6 0 2 6 5 1 2 7 5 7 4 2 1 4 2 2 1 5 7 0 1 4 6 0 1 4 010 111 110 001 010 111 110 000 010 110 101 001 010 111 101 111 100 010 001 100 010 010 001 101 111 000 001 100 111 000 001 100 Horizontal structures (records) Scanned vertically R12 R11 R10 R22 R21 R20 R32 R31 R30 R42 R41 R40 0 1 0 1 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 1 1 0 0 P12 P11 P10 P22 P21 P20 P32 P31 P30 P42 P41 P40 1 1 0 1 1 1 0 1 1 1 1 1 0 10 1 1 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 0 1 3 2 3 0 1 2 3 3 0 1 0 1 2 0 1 1 2 3 1 2 3 1 0 1 2 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 01 10 1 10 1 1 1 0 1 0 1 1-level LOA Ptrees, Process horizontally – thru AND processed vertically (vertical scans)

  8. 101 0 0 0 100 0 0 1 101 1 0 1 101 1 1 1 100 0 0 1 101 1 0 1 101 1 0 1 1 100 1 1 1 101 1 1 1 001 0 0 1 100 0 0 1 101 0 1 1 101 0 0 1 100 1 1 1 101 0 0 1 101 1 1 1 100 1 1 0 100 1 1 1 0 1 2 3 8 9 11 12 13 14 15 20 21 22 23 28 29 30 31 0 1 2 3 4 5 6 7 16 17 18 19 20 101 1 0 0 100 0 0 1 101 1 0 1 101 0 0 1 101 1 1 1 100 1 1 0 1 100 0 0 1 101 1 0 1 100 1 1 1 000 1 0 1 000 1 1 1 1 2 6 7 16 17 18 19 20 100 0 0 1 111 1 1 0 101 0 0 1 1 1 000 0 1 1 100 0 0 0 110 0 0 0 2 6 7 16 17 18 19 20 100 0 0 1 111 1 1 0 101 0 0 1 1 1 000 0 1 1 100 0 0 0 110 0 0 1 AND-ing a Set of Basic Ptrees 2 8 20 0 100 0 0 1 101 1 0 1 2 20 0 TOL = {21, ∞, ∞, ∞} TOL = {8, 3, 6, 6} TOL = {0, 0, 2, 2} TOL = {8, 16, 6, 6} TOL = {20 16, 6, 6} TOL = {3, 3, 6, 6} 101 1 1 1 100 0 0 1 MAXID = 2 MAXID = 6 MAXID = 16 MAXID = ∞ MAXID = 20 MAXID = 8 ANDing is Over 2 20 0 111 1 1 0 110 0 0 0 0 2 20 100 0 0 0 100 0 0 1 2 20 0 111 1 1 0 110 0 0 1

  9. Algorithm: AND-ing a set of Basic P-trees • Algorithm: AND • INPUT: Basic P-tree Set ≡ [Pn-1, Pn-2,..., P1, P0] • OUTPUT: The resultant P-tree after AND-ing • Pt(N, d, D, ς = 0, SI={}, SR={}) • FI = {0,0,0, …,0} // pointers to index list • DONE = false • while (!DONE) • MAX_ID = max(P0.SI[FI[0]], P1.SI[FI[2]],… ,Pn-1.SI[FI[n-1]]) • SEQUENCE = PURE1_SEQUENCE • for i = 0 to n - 1 • if (Pi.SI.element(MAX_ID) ) • if (SEQUENCE.length) > 1) • SEQUENCE = SEQUENCE & Pi.SR[MAX_ID] • FI[i] = Pi.SI.index(MAX_ID) + 1 • else if(Pi.SI.index(MAX_ID) != Pi.SI.END) • FI[i] = Pi.SI.index(MAX_ID) • break • else if(MAX_ID == Pi.SI.END) • DONE = true • End if • End for • if (count(SEQUENCE) > 0) • Pt.SI.push(MAX_ID); • Pt.SR.push(SEQUENCE); • End if • End while • return Pt

  10. AND-ing a Set of Mixed Ptrees 101 0 0 0 100 0 0 1 101 1 0 1 101 1 1 1 100 0 0 1 101 1 0 1 101 1 0 1 1 100 1 1 1 101 1 1 1 001 0 0 1 100 0 0 1 101 0 1 1 101 0 0 1 100 1 1 1 101 0 0 1 101 1 1 1 100 1 1 0 100 1 1 1 0 1 2 3 6 9 11 12 13 14 16 20 21 22 23 28 29 30 31 1 0 1 2 3 4 5 6 7 10 17 18 20 25 101 1 0 0 100 0 0 1 101 1 0 1 101 0 0 1 101 1 1 1 100 1 1 0 0 100 0 0 1 101 1 0 1 100 1 1 1 000 1 0 1 000 1 1 1 0 0 2 6 7 16 17 18 20 28 100 0 0 1 111 1 1 0 101 0 0 1 1 1 000 0 1 1 100 0 0 0 110 0 0 0 2 16 6 0 2 6 7 16 17 20 22 26 100 0 0 1 100 0 1 0 111 1 1 0 101 0 0 1 100 0 0 1 000 0 1 1 100 0 0 0 110 0 0 0 0 2 28 6 16 9 20 100 0 0 1 101 1 0 1 100 0 0 1 101 1 1 1 2 6 20 100 0 0 1 101 0 0 1 100 0 0 1 TOL = {21, ∞, 28, ∞} MAX = ∞ TOL = {20, 17, 17, 17} MAX = 20 TOL = {21, 25, 28, 22} MAX = 28 TOL = {9, 10, 16, 7} MAX = 16 TOL = {9, 7, 7, 7} MAX = 9 TOL = {0, 0, 2, 2} MAX = 2 TOL = {3, 3, 6, 6} MAX = 6 2 6 16 20 28 ANDing is Over 100 0 0 1 100 0 0 0 111 1 1 0 0 20 100 0 0 0 100 0 0 1 100 0 0 1 000 0 0 0 2 6 16 20 100 0 0 1 111 1 1 0 100 0 0 1 000 0 1 1

  11. Algorithm: AND-ing a set of Mixed P-trees • Algorithm: AND • INPUT: Mixed P-tree Set ≡ [Pn-1, Pn-2,..., P1, P0] • OUTPUT: The resultant P-tree after AND-ing • Pt(N, d, D, ς = 0, SI={}, SR={}) • FI = {0,0,0, …,0} // pointers to index list • DONE = false • while (!DONE) • MAX_ID = max(P0.SI[FI[0]], P1.SI[FI[1]],… ,Pn-1.SI[FI[n-1]]) • SEQUENCE = PURE1_SEQUENCE • for i = 0 to n - 1 • if (Pi.SI.element(MAX_ID)) • if (Pi.SR[MAX_ID].length > 1) • SEQUENCE = SEQUENCE & Pi.SR[MAX_ID] • else if(Pi.SR[MAX_ID] == {0}) • SEQUENCE = {0} • break • else if(Pi.SR[MAX_ID] == {1}) • SEQUENCE = SEQUENCE • End if • FI[i] = Pi.SI.index(MAX_ID)+1 • else if(MAX_ID != Pi.SI.END) • FI[i] = Pi.SI.index(MAX_ID)+1 • else if(MAX_ID == Pi.SI.END && Pi.ς == 0) • DONE = true • break • End if • End for • if (count(SEQUENCE) > 0) • Pt.SI.push(MAX_ID); • Pt.SR.push(SEQUENCE); • End if • End while • return Pt

  12. 1 0 1 2 3 6 9 15 22 25 27 31 101 0 0 0 0 101 1 0 1 100 0 0 1 101 0 1 1 101 0 0 1 100 1 1 1 0 101 1 1 1 100 1 1 0 100 1 1 1 3 5 6 7 8 9 15 17 18 19 20 22 25 27 31 0 100 1 1 1 100 0 0 0 000 0 1 1 0 0 100 0 0 0 101 1 0 1 0 100 0 0 1 101 0 1 1 100 0 0 1 0 101 0 0 1 100 1 1 0 1 0 5 8 20 27 101 1 0 0 100 0 0 1 101 1 0 1 100 1 1 0 0 1 0 6 7 15 17 18 19 31 100 0 0 1 111 1 1 0 101 0 0 1 0 0 000 0 1 1 0 110 0 0 0 1 0 6 8 15 17 18 20 22 100 0 0 1 00001 0 101 0 0 1 0 0 001 0 1 1 100 0 0 0 110 0 0 0 0 1 2 P-tree Operation: AND-ing a Set of Complemented P-trees 101 0 0 0 0 101 1 0 1 1 0 0 1 2 0 000 0 0 0 101 1 0 0 0 111 1 1 0 TOL = {2, 5, 6, 6} MIN = 2 TOL = {1, 5, 6, 6} MIN = 1 TOL = {0, 0, 0, 0} MIN = 0 0 00001 0

  13. Algorithm: AND-ing a set of Complemented P-trees • Algorithm: AND • INPUT: Complemented P-tree Set ≡ [Pn-1, Pn-2,..., P1, P0] • OUTPUT: The resultant P-tree after AND-ing • Pt(N, d, D, ς = 1, SI={}, SR={}) • FI = {0,0,0, …,0} // pointers to index list • DONE = false • while (!DONE) • MIN_ID = min(P0.SI[FI[0]], P1.SI[FI[1]],… ,Pn-1.SI[FI[n-1]]) • // assuming each p-tree index list end with ∞ • if (MIN_ID == ∞ ) DONE = true • SEQUENCE = PURE1_SEQUENCE • for i = 0 to n - 1 • if (Pi.SI.element(MIN_ID) ) • if (Pi.SR[MIN_ID].length) > 1) • SEQUENCE = SEQUENCE & Pi.SR[MIN_ID] • else • SEQUENCE = 0 • break • End if • FI[i] = FI[i]+1 • End if • End for • Pt.SI.push(MIN_ID) • if (count(SEQUENCE) > 0) • Pt.SR.push(SEQUENCE) • else • Pt.SR.push(0) • End if • End while • return Pt

  14. 1 0 6 7 16 17 18 19 011 1 . 1 011 . . 0 0 111 . . 0 000 . . 0 010 . . 0 6 7 16 17 18 19 000 . . 1 100 . . 1 1 101 . . 1 100 . . 0 111 . . 1 P-tree Operation: COUNT Algorithm: COUNT Input: P(N, d, D, ς, SI, SR) Output: count of 1’s in P SUM = 0 L = |SI| for i = 0 to L - 1 if(P.SRi == 1) SUM = SUM + N/2dD else SUM = SUM + count(P.SRi) done for if(P.ς == 1) SUM = SUM + (N – |SI|*N/2dD ) return SUM

More Related