1 / 8

Algorithm for the Aggregate Function SUM

Learn about algorithms for evaluating aggregate functions such as sum, average, max, min, median, rank, and top-k using P-trees.

yjorgensen
Download Presentation

Algorithm for the Aggregate Function SUM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithm for the Aggregate Function SUM Algorithm 4.1 Evaluating sum () with P-tree. total = 0.00; For i = 0 to n {total = total + 2i * RootCount (Pi);} Return total SUM function: Sum function can total a field of numerical values. P4,3P4,2P4,1 P4,0 0 1 1 1 0 0 0 0 1 0 1 1 1 1 10 5 6 7 11 9 3 1 0 0 0 1 1 0 1 0 1 1 1 0 1 • For example, if we want to know the total number of products which were sold out in relation S, the procedure is showed on left {3} {3} {5} {5} 51 23 * + 22 * + 21 * + 20 * =

  2. Algorithm of Aggregates AVERAGE, MAX Algorithm 4.2 Evaluating max () with P-tree. max = 0.00; c = 0; /*Pc is set all 1s*/ For i=n to 0 { c=Count (Pc AND Pi); If (c >= 1) Pc = Pc AND Pi; max = max + 2i } Return max; Average function: Average function will show the average value in a field. It can be calculated from function COUNT and SUM. Average () = Sum ()/Count (). Steps IFPos Bits P4,3P4,2P4,1 P4,0 1. Pc = P4,3 RootCount (Pc) = 3 >=1 0 1 1 1 0 0 0 0 1 0 1 1 1 1 10 5 6 7 11 9 3 1 0 0 0 1 1 0 1 0 1 1 1 0 1 {1} {0} 2. RootCount (Pc ANDP4,2) = 0 <1 Pc = Pc AND P’4,2 3. RootCount (Pc ANDP4,1 ) = 2 >=1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc ANDP4,0 ) = 1 >=1 {1} {1} {1} {1} 11 23 * + 22 * + 21 * + 20 * = {0}

  3. Algorithm of Aggregate Function MIN Algorithm 4.3. Evaluating Min () with P-tree. min=0.00; c = 0; /*Pc is set all 1s*/ For i=n to 0 { c=RootCount (Pc AND NOT (Pi)); If (c >= 1) Pc=Pc AND NOT (Pi); Else min = min + 2i; } Return min; Steps IFPos Bits P4,3P4,2P4,1 P4,0 1. Pc = P’4,3 RootCount (Pc) = 4 > =1 0 1 1 1 0 0 0 0 1 0 1 1 1 1 10 5 6 7 11 9 3 1 0 0 0 1 1 0 1 0 1 1 1 0 1 {0} 2. RootCount (Pc ANDP’4,2) = 1 >=1 Pc = Pc AND P’4,2 {0} 3. RootCount (Pc ANDP’4,1 ) = 0 <1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc ANDP’4,0 ) = 0 <1 {1} {0} {0} {1} {1} 23 * + 22 * + 21 * + 20 * = 3

  4. Algorithms of Aggregate Function MEDIAN, RANK, TOP-K Algorithm 4.4. Evaluating Median () with P-tree median=0.00; pos=N/2; for rank pos=K; c = 0; /*Pc is set all 1s for single attribute*/ For i=n to 0 { c=RootCount (Pc&Pi); If (c >= pos) median=median+2i; Pc=Pc&Pi; Else pos=pos-c; Pc=Pc &NOT(Pi);} Return median; Rank (K) function returns the value that is the kth largest value in a field. Steps IFPos Bits P4,3P4,2P4,1 P4,0 1. Pc = P4,3 RootCount (Pc) = 3 < 4 Pc = P’4,3 pos = 4 – 3 = 1 0 1 1 1 0 0 0 0 1 0 1 1 1 1 10 5 6 7 11 9 3 1 0 0 0 1 1 0 1 0 1 1 1 0 1 {0} 2. RootCount (Pc ANDP4,2) = 3 >=1 Pc = Pc AND P4,2 {1} 3. RootCount (Pc ANDP4,1 ) = 2 >=1 Pc = Pc AND P4,1 {1} 4. RootCount (Pc ANDP4,0 ) = 1 >=1 {1} 7 23 * + 22 * + 21 * + 20 * = {0} {1} {1} {1} Top-k: (largest k values): Find rank k value Vk. Find all tuples  Vk using EINRING.

  5. LOC 0 0 0 0 1 P1,4 P1,3 P1,2 P1.1 P1.0P’1,4 P’1,3 P’1,2 P’1.1 P1.0 PNY PNY 1 0 1 1 0 0 0 PMN 0 1 0 0 1 0 1 PCH 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 Iceberg Query Operation Using P-trees SELECT Loc, Type, Sum (# Product) FROM Relation S GROUPBY Loc, Type HAVING Sum (# Product) >= 15 Step one: Build value P-trees for the 4 values, {Loc| New York, Minneapolis, Chicago}, of attribute Loc. Calculation of value P-tree PNY. Because binary value of New York is 00001, we will get (1). PNY = P’1,4 AND P’1,3 AND P’1,2 AND P’1,1 AND P1,0 (1)

  6. PNotebook PDesktop PPrinter PFAX 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 After getting all the value P-trees for each location, we calculate the total number of products sold in each place. We still use the value, New York, as our example. Sum(# product | New York) = 23 * RootCount (P4,3 AND PNY) + 22 * RootCount (P4,2 AND PNY) + 21 * RootCount (P4,1 AND PNY) + 20 * RootCount (P4,0 AND PNY) = 8 * 1 + 4 * 2 + 2 * 3 + 1 * 1 = 23 (2) Table shows total number of products sold out in each location. Because threshold is 15, eliminate Chicago. Step 2: Similarly, build P-trees for every value of attribute Type. Attribute Type has 4 values {Type | Notebook, desktop, Printer, Fax}. Figure shows P-tree of 4 values of attrib Type. Similarly we get the summary table for each value of attribute Type. According to the threshold, T equals 15, only value P-tree of notebook will be used in the future.

  7. Calculate total notebooks sold in New York by Sum(# Product | New York) = 23 * RootCount (P4,3 ANDPNY AND Notebook) + 22 * RootCount (P4,2 AND PNY AND Notebook) + 21 * RootCount (P4,1 AND PNY AND Notebook) + 20 * RootCount (P4,0 ANDPNY AND Notebook) = 8 * 1 + 4 * 1 + 2 * 2 + 1* 1 = 17 (3) PMN PNotebook PMN AND Notebook 0 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 0 AND = Iceberg Step three: We only generate candidate Loc and Type pairs for local store and Product type, which can pass T. By Performing And op on PNY with PNotebook, obtain P-treePNYAND Notebook By performing And operations on PMN with P Notebook, obtain value P-tree PMNAND Notebook PNY PNotebookPNY AND Notebook 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 0 = AND

  8. We calculate the total number of notebook sold out in Minneapolis by formula 4. Sum (# product | Minneapolis) = 23 * RootCount (P4,3 AND PMNAND Notbook) + 22 * RootCount (P4,2 AND PMN ANDNotbook) + 21 * RootCount (P4,1 AND PMNAND Notbook) + 20 * RootCount (P4,0 AND PMNAND Notbook) = 8 * 1 + 4 * 0 + 2 * 1 + 1 * 1 = 11 (4) Finally, we obtain the summary table 5. According to the threshold T=15, we can see that only group pair “New York And Notebook” pass our threshold T. From value P-tree PNYAND Notebook, we can see that tuple 1 and 4 are in the results of our iceberg query example. PNYAND Notebook 1 0 0 1 0 0 0

More Related