1 / 21

Concurrent Cache-Oblivious B-trees Using Transactional Memory

Concurrent Cache-Oblivious B-trees Using Transactional Memory. Jim Sukha Bradley Kuszmaul MIT CSAIL June 10, 2006. Thought Experiment. Imagine that, one day, you are assigned the following task:.

rext
Download Presentation

Concurrent Cache-Oblivious B-trees Using Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrent Cache-Oblivious B-trees Using Transactional Memory Jim Sukha Bradley Kuszmaul MIT CSAIL June 10, 2006

  2. Thought Experiment Imagine that, one day, you are assigned the following task: Enclosed is code for a serial, cache-oblivious B-tree. We want a reasonably efficient parallel implementation that works for disk-resident data. Attach:COB-tree.tar.gz PS. We want to be able to restore the data to a consistent state after a crash too. PPS. Our deadline is next week. Good luck!

  3. Concurrent COB-tree? Question: How can one program a concurrent, cache-oblivious B-tree? Approach: We employ transactional memory. What complications does I/O introduce?

  4. Potential Pitfalls Involving I/O Suppose our data structure resides on disk. • We might need to make explicit I/O calls to transfer blocks between memory and disk. But a cache-oblivious algorithm doesn’t know the block size B! • We might need buffer management code if the data doesn’t fit into main memory. • We might need to unroll I/O if we abort a transaction that has already written to disk.

  5. Our Solution: Libxac • We have implemented Libxac, a page-based transactional memory system that operates on disk-resident data. Libxac supports ACID transactions on a memory-mapped file. • Using Libxac, we are able to implement a complex data structure that operates on disk-resident data, e.g. a cache-oblivious B-tree.

  6. Libxac Handles Transaction I/O • We might need to make explicit I/O calls to transfer blocks between memory and disk. Similar to mmap, Libxac provides a function xMmap. Thus, we can operate on disk-resident data without knowing block size. • We might need buffer management code if the data doesn’t fit into main memory. Like mmap, the OS automatically buffers pages in memory. • We might need to unroll I/O if we abort a transaction that has already written to disk. Since Libxac implements multiversion concurrency control, we still have the original version of a page even if a transaction aborts.

  7. Outline • Programming with Libxac • Cache-Oblivious B-trees

  8. Runtime initialization function. For durable transactions, logs are stored in the specified directory.* Transactionally maps the first page of the input file. Transaction body. The body can be a complex function (e.g., a cache-oblivious B-tree insert!). Unmap the region. Shutdown runtime. * Currently Libxac logs the transaction commits, but we haven’t implemented the recovery program yet. Example Program with Libxac int main(void) { int* x; int status = FAILURE; xInit(“/logs”, DURABLE); x = xMmap(“input.db”, 4096); while (status != SUCCESS) { xbegin(); x[0] ++; status = xend(); } xMunmap(x); xShutdown(); return 0; }

  9. Libxac Memory Model • Aborted transactions are visible to the programmer (thus, programmer must explicitly retry transaction). Control flow always proceeds from xbegin() to xend(). Thus, the xaction body can contain system/library calls. • At xend(), all changes to xMmap’ed region are discarded on FAILURE, or committed on SUCCESS. • Aborted transactions always see consistent state. Read-only transactions can always succeed. int main(void) { int* x; int status = FAILURE; xInit(“/logs”, DURABLE); x = xMmap(“input.db”, 4096); while (status != SUCCESS) { xbegin(); x[0] ++; status = xend(); } xMunmap(x); xShutdown(); return 0; } *Libxac supports concurrent transactions on multiple processes, not threads.

  10. Implementation Sketch • Libxac detects memory accesses by using a SIGSEGV handler to catch a memory protection violation on a page that has been mmap’ed. • This mechanism is slow for normal transactions: • Time for mmap, SIGSEGV handler: ~ 10 ms • Efficient if we must perform disk I/O to log transaction commits. • Time to access disk: ~ 10 ms

  11. Is xMmap practical? Experiment on a 4-proc. AMD Opteron, performing 100,000 insertions of elements with random keys into a B-tree. Each insert is a separate transaction. Libxac and BDB both implement group commit. B-tree and COB-tree both use Libxac. Note that none of the three data structures have been properly tuned. Conclusion: We should achieve good performance.

  12. Outline • Programming with Libxac • Cache-Oblivious B-trees

  13. What is a Cache-Oblivious B-tree? • A cache-oblivious B-tree (e.g. [BDFC00]) is a dynamic dictionary data structure that supports searches, insertions/deletions, and range-queries. • An cache-oblivious algorithm/data structure does not know system parameters (e.g. the block size B.) • Theorem [FLPR99]: a cache-oblivious algorithm that is optimal for a two-level memory hierarchy is also optimal for a multi-level hierarchy.

  14. 31 1 -- 56 70 -- 54 39 13 -- 6 23 21 10 -- -- -- -- 38 59 83 48 45 40 4 24 -- 16 -- 7 15 -- Cache-Oblivious B-Tree Example Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 Packed Memory Array (PMA) • The COB-tree can be divided into two pieces: • A packed memory array that stores the data in order, but contains gaps. • A static cache-oblivious binary-tree that indexes the packed memory array.

  15. 6 -- 39 70 13 54 -- -- -- 56 23 31 21 10 -- 1 -- -- -- 59 83 48 40 45 4 38 -- -- 7 15 16 24 Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 To insert a key of 37:

  16. 31 1 -- 56 70 -- 54 39 13 -- 6 23 21 10 -- -- -- -- 38 59 83 48 45 40 4 24 -- 16 -- 7 15 -- Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 • To insert a key of 37: • Find correct section of PMA location using static tree. 37

  17. 31 1 -- 56 70 -- 54 39 13 -- 6 23 21 10 -- -- -- -- 38 59 83 48 45 40 4 24 -- 16 -- 7 15 -- Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 • To insert a key of 37: • Find correct section of PMA location using static tree. • Insert into PMA. This step may cause a rebalance of the PMA. 37

  18. 6 -- 38 83 13 54 -- 40 45 59 23 31 21 10 -- 1 56 -- -- 70 -- 48 39 -- 4 37 -- -- 7 15 16 24 Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 45 4 16 38 54 4 10 16 21 38 45 54 83 • To insert a key of 37: • Find correct section of PMA location using static tree. • Insert into PMA. This step possibly requires a rebalance. • Fix the static tree.

  19. 6 -- 38 83 13 54 -- 40 45 59 23 31 21 10 -- 1 56 -- -- 70 -- 48 39 -- 4 37 -- -- 7 15 16 24 Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 40 4 16 37 56 4 10 16 21 37 40 56 83 • To insert a key of 37: • Find correct section of PMA location using static tree. • Insert into PMA. This step possibly requires a rebalance. • Fix the static tree.

  20. 6 -- 38 83 13 54 -- 40 45 59 23 31 21 10 -- 1 56 -- -- 70 -- 48 39 -- 4 37 -- -- 7 15 16 24 Cache-Oblivious B-Tree Insert Static Cache-Oblivious Tree 21 10 40 4 16 37 56 4 10 16 21 37 40 56 83 Insert is a complex operation. If we wanted to use locks, what is the locking protocol? What is the right (cache-oblivious?) lock granularity?

  21. Conclusions A page-based TM system such as Libxac • Represents a good match for disk-resident data structures. • The per-page overheads of TM are small compared to cost of I/O. • Is easy to program with. • Libxac allows us to program a concurrent, disk-resident data structure with ACID properties, as though it was stored in memory.

More Related