530 likes | 678 Views
V Storage Manager. Shahram Ghandeharizadeh Computer Science Department University of Southern California. Simple Tests. Arbitrary Designed to stress your software and ensure its robustness: Repeated creation of the same datazone name.
E N D
V Storage Manager Shahram Ghandeharizadeh Computer Science Department University of Southern California
Simple Tests • Arbitrary • Designed to stress your software and ensure its robustness: • Repeated creation of the same datazone name. • Deletion of records in iteration j when they were deleted in iteration (j-1) • Etc. • Designed to be scalable. • Your test can be implemented with a variable A that captures the number of records in alpha (10,000). • Value of all other important parameters can be a function of A, e.g., number of records for beta, number of iterations in Test 3. • Start with a small value for A, say 100. Once your tests are working, increase the value of A to 1,000. Once this works, increase A to 10,000. • Do not be surprised to find your code break when you increase the value of A from 100 to 1,000. This is the reality of developing robust software: Size matters!
Suggestion • Focus on a single-threaded version of your implementation. • Once all tests are running, extend to analyze the impact of multi-threading. • This may require a re-visit of your designs.
Execution of your program consists of two kinds of dynamic memory: Heap and Stack. Use of malloc and new allocates memory from Heap. The programmer is responsible to free this memory and return it to heap. Invocation of a method uses a stack. All variables declared in a method are placed on the stack. When the method returns, its stack is freed. Heap versus Stack Code static data stack heap
In method Test 1, the character array named “payload” is declared on the stack when Test 1 is invoked. Its memory is freed when Test 1 completes execution. The programmer is NOT responsible fore managing the memory assigned to payload because it is a local variable managed using the stack. Test 1 () { char[10000] payload; vdt vptr; vptr.set_data(payload); …. } Heap versus Stack (Example) Code static data stack heap
In method Test 1, the character array named “payload” is assigned memory from the heap (using new). The variable payload is on the stack! The memory pointed to by “payload” is allocated from the heap. The programmer is responsible for freeing this memory using delete. Test 1 () { char *payload; vdt vptr; payload = new char[1000]; vptr.set_data(payload); …. } Heap versus Stack (Example) Code static data stack heap
The following is FALSE: Memory allocated in method X can be freed only in method X. See the example as proof. Cause: Debugging C/C++ programs is difficult. It is easy to corrupt memory if you are not careful! These errors are difficult to find. They are also stressful, resulting in beliefs that are not true Urban legend is born. How to avoid these kinds of conceptual traps? Write small programs to verify a belief that sounds too good to be true. It is simple and avoids digressions that waste your time and cause a lot of heart ache. GenMem::GenMem(Vdt *v) { char *cptr; cptr = new char[10]; v->set_data(cptr); memcpy(cptr, "Shahram", 7); } int _tmain(int argc, _TCHAR* argv[]) { Vdt vptr; char *cptr; GenMem *GM = new GenMem(&vptr); cptr = (char *) vptr.get_data(); delete cptr; printf ("Exiting this simple test."); return 0; } Urban Legends about Heap
Variant Indexes by P. O’Neil and D. Quass Shahram Ghandeharizadeh Computer Science Department University of Southern California
Key Assumptions • A read-mostly database that is updated infrequently. • Complex indexes to speedup queries. • Focuses on physical designs to enhance performance.
Example Data Warehouse • McDonalds keeping track of different sandwich purchases. SALES TIME Cid Pid Day Amt dollar_cost Unit_sales PROD Day Week Month Year Holliday Weekday Pid Name Size Weight Package_type
Key Observations: A handful of products, a PROD table with tens of rows. Many millions of rows for SALES tables. Example Data Warehouse SALES TIME Cid Pid Day Amt dollar_cost Unit_sales PROD Day Week Month Year Holliday Weekday Pid Name Size Weight Package_type
A B+-Tree On the Pid of Sales • Assuming McDonald’s sales 12 different products B+-tree Leaf page (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
A B+-Tree On the Pid of Sales • Assuming McDonald’s sales 12 different products B+-tree Leaf page (Big Mac, (1,1), (1, 3), (1, 4), (2,4), …. What happens with a SALES table consisting of a million rows? Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
A B+-Tree on Major Holidays • A B+-tree index on different holidays of the SALES table. B+-tree Leaf page (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
A B+-Tree on Major Holidays • A B+-tree index on different holidays of the SALES table. B+-tree Leaf page (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. Value List Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
A B+-Tree on Major Holidays • A B+-tree index on different holidays of the SALES table. B+-tree Leaf page (Pres day, (1,2), (1, 3), (1, 4), (2,1), …. Value List RID List Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
Conjunctive Queries • Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES • With RID-Lists • Get the Value-List for “Big Mac” using the B+-tree, obtain RID-List1. • Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2. • Compute set-intersect of RID-List1 and RID-List2 • Count the number of RIDs in the intersection set. • Is there a better way? • Yes, use bit-maps and logical bit-wise operands.
Bitmap Indexes • Use a bitmap to represent the existence of a record with a certain attribute value. • Example: If a record has the indexed attribute value “Big Mac” then its corresponding entry in the bitmap is set to one. Otherwise, it is a zero.
A Bitmap • A Bitmap B is defined on T as a sequence of M bits. • For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. • Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 0
A Bitmap • A Bitmap B is defined on T as a sequence of M bits. • For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. • Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 1
A Bitmap • A Bitmap B is defined on T as a sequence of M bits. • For each row r with row number j that has the property P, we set bit j in B to one; all other bits are set to zero. • Assuming fix sized disk pages that hold p records, RID of record j is (j/p, j%p). Page is j/p, slot number is j%p. Pres Day, 0100001100111111110000011001….. Record 2
A B+-Tree on Major Holidays • A B+-tree index on different holidays of the SALES table. B+-tree Leaf page (Pres day, 01111111…. Joe, Big Mac, Lab day, … Jane, Happy Meal, Pres day, … Shideh, Happy Meal, Pres day, … Mary, Fries, Pres day, … Kam, Happy Meal, Pres day, … Harry, Big Mac, Pres day, … Henry, Big Mac, Pres day, … Bob, Big Mac, Pres day, …
Logical Bit-Wise Operations • Three key operands: AND, OR, NOT • Assume a bit map consisting of 4 bits: • 0011 AND 0101 = 0001 • 0011 OR 0101 = 0111 • NOT 0011 = 1100 • This paper assumes bit maps consisting of millions, if not billions, of bits. In Example 3.1, they assume a bitmap consisting of 100,000,000 bits, 12.5 Mega bytes. • A large bit map is stored in a sequence of disk pages. Each disk page full of bits is termed a fragment. • Some bit positions may correspond to non-existent rows. An Existence Bitmap (EBM) has exactly those 1 bits corresponding to existing rows.
Summary ANY QUESTIONS?
Range Predicate • SELECT target-list • FROM T • WHERE C-range C-range = {C > c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} • How to process with a bit-map index?
Range Predicate • SELECT target-list • FROM T • WHERE C-range C-range = {C > c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} • How to process with a bit-map index?
Range Predicate • SELECT target-list • FROM T • WHERE C-range C-range = {C > c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2} • How to process with a bit-map index?
Conjunctive Queries • Count number of Big Mac Sales on “President’s Day” assuming a B+-tree on product (pid) and day of SALES • With RID • Get the Value-List for “Big Mac” using the B+-tree, obtain RID-List1. • Get the Value-List for “President’s Day” using the B+-tree, obtain RID-List2. • Compute set-intersect of RID-List1 and RID-List2 • Count the number of RIDs in the intersection set. • With bit maps • Get the Value-List for “Big Mac” using the B+-tree, obtain bit-map1. • Get the Value-List for “President’s Day” using the B+-tree, obtain bit-map2. • Recall Existence Bitmap (EBM) identify rows that exist. • Let RES = logical AND of bit-map1, bit-map2, and EBM. • Count the number of bits set to one to identify how many Big Macs were sold on “President’s Day”.
Projection Index • Reminiscent of vertical partitioning. • Once the qualifying records are found, the projection index enables the system to find the amt attribute value of the record with a few disk I/Os. amt cid pid holliday amt 450 699 598 799 520 … 450 699 598 799 520 … Labor day Labor day Presidents day Labor day Labor day …..
Projection Index (Definition) • Page 41, first paragraph of Section 2.2
Projection Index (Example Usage) • Page 41, middle of left hand column:
Bit-Sliced Indexes: Motivation • Assume the “Amt” values are in dollars and as follows: 1 3 5 7 3 3 1
Bit-Sliced Indexes: Motivation • Assume the “Amt” values are in dollars and as follows. Their binary representation is: 1 3 5 7 3 3 1 001 011 101 111 011 011 011
Bit-Sliced Indexes: Motivation • Now, number the order of records as before: 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 011
Bit-Sliced Indexes: Motivation • Construct a Bit-Sliced index: 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 011 Bit 0, 1111111 Bit 1, 0101111 Bit 2, 0011000
Bit-Sliced Indexes: Motivation • To compute the sum of all records using the existence bit-map bnn (1111111): 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 ? Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
Bit-Sliced Indexes: Motivation • To compute the sum of all records using the existence bit-map bnn (1111111): 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 1 * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1) Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
Bit-Sliced Indexes: Motivation • To compute the sum of all records using the existence bit-map bnn (1111111): 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 1 * (7 records with bit 0 set to 1) + 2 * (4 records with bit 1 set to 1) + 4 * (2 records with bit 2 set to 1) = (1 * 7) + (2 * 4) + (4 * 2) = 23 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
Bit-Sliced Indexes: Definition • Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define
Bit-Sliced Indexes: Definition • Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define Why maintain Bn?
Bit-Sliced Indexes: Definition • Interpret the value of the “Amt” column as an integer number of pennies, represented as a binary number with N+1 bits. Define The result of a scalar such as SUM involving a null will itself be a null. Example, see: http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html
Bit-Sliced Index • 20 Bitmaps for the “Amt” column represents quantities up to 220 – 1 pennies, $10,485.75. • If we assume normal sales range up to $100.00, and all values are as likely to occur, a Value-List index would have nearly 10,000 different values. A Bitmap representation would lose its effectiveness. However, Bit-sliced indexes continue to perform well.
Example with Value-List Index • Assume SALES table has 100 million rows. Each row is 200 bytes in length. Disk page is 4 Kbytes, holding 20 rows. • Query: SELECT SUM(AMT) FROM SALES WHERE condition • Bitmap Bf = the Foundset • Bitmap Bv for each value • Bnn = Existance bitmap
Example with Bit-Sliced Indexes • Query: SELECT SUM(AMT) FROM SALES WHERE condition • Bitmap Bf = the Foundset • Bitmap Bv for each value • Bnn = Existance bitmap • 20 bits: Bit 0, 01010101010… Bit 1, 10101011111… … Bit 19, 0000000001…
Other Aggregate Functions • Ignore MEDIAN & Column-Product. • SELECT AGG(C) FROM T WHERE condition • AGG(C) is COUNT, SUM, AVG, MIN, MAX
Range Queries SELECT target-list FROM T WHERE C-range C-range = {C > c1, C >= c1, C = c1, C <= c1, C < C1, C between c1 and c2}
Bit-Sliced Indexes • Assume c1 = 3, {011} 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 BGT = BLT = 0000000 BEQ = 1111111 If bit 2 is on in constant c1 {} Else BGT = 0000000 | (1111111 & 0011000) = 0011000 BEQ = 1111111 & ~(0011000) = 1111111 & 1100111 = 1100111 Iteration 1 on Bit 2: BLT = 0000000 BGT = 0011000 BEQ = 1100111 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
Bit-Sliced Indexes • Assume c1 = 3, {011} 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 BLT = 0000000 BGT = 0011000 BEQ = 1100111 If bit 1 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = 0000000 | (1100111 & ~(0101110)) = 0000000 | (1100111 & 1010001) = 1000001 BEQ = BEQ & B1 = 1100111 & 0101110 = 0100110 Iteration 2 on Bit 1: BLT = 1000001 BGT = 0011000 BEQ = 0100110 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000
Bit-Sliced Indexes • Assume c1 = 3, {011} 0 1 2 3 4 5 6 1 3 5 7 3 3 1 001 011 101 111 011 011 001 BLT = 1000001 BGT = 0011000 BEQ = 0100110 If bit 0 is on in constant c1 BLT = BLT | (BEQ & NOT(B1)) = 1000001 | (0100110 & ~(1111111)) = 1000001 | (0100110 & 0000000) = 1000001 BEQ = BEQ & B1 = 0100110 & 1111111 = 0100110 Iteration 3 on Bit 0: BLT = 1000001 BGT = 0011000 BEQ = 0100110 Bit 0, 1111111 Bit 1, 0101110 Bit 2, 0011000