1 / 25

Aether : A Scalable Approach to Logging

Databases. @ Carnegie Mellon. Ryan Johnson †‡ Ippokratis Pandis †‡ Radu Stoica ‡ Manos Athanassoulis ‡ Anastasia Ailamaki †‡ †Carnegie Mellon University ‡École Polytechnique Fédérale de Lausanne. Aether : A Scalable Approach to Logging. VLDB 2010. Scalability is key!.

kayla
Download Presentation

Aether : A Scalable Approach to Logging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases @ CarnegieMellon Ryan Johnson†‡Ippokratis Pandis†‡ Radu Stoica ‡ Manos Athanassoulis ‡ Anastasia Ailamaki †‡ †Carnegie Mellon University ‡École Polytechnique Fédérale de Lausanne Aether: A Scalable Approach to Logging VLDB 2010

  2. Scalability is key! • Modern hardware needs software parallelism • OLTP is inherently parallel at the request level • Very good on providing high concurrency • But, internal serializations limit execution parallelism Need for scalable OLTP components

  3. Logging is crucial for OLTP (e.g., Amazon outage*) • Fault tolerance • Crash recovery • Transaction abort/rollback • Performance • Log changes for durability (no in-place updates) • Write dirty pages back asynchronously $$$ Need efficient and scalable logging solution • * http://www.datacenterknowledge.com/archives/2010/05/13/car-crash-triggers-amazon-power-outage/

  4. Logging is bottleneck for scalability (1) At commit, must yield for log flush • synchronous I/O at critical path • locks held for long time • two context switches per commit (2) Must insert records to the log buffer • centralized main-memory structure • source of contention • Working around the bottlenecks: • Asynchronous commit • Replace logging with replication and fail-over CPU-2 CPU-N CPU-1 CPU L1 L1 L1 L2 Data Log RAM HDD Workarounds compromise durability

  5. Does “correct” logging have to be so slow? • Locks held for long time • Not actually used during the flush • Indirect way to enforce isolation • Two context switches per commit • Transactions nearly stateless at commit time • Easy to migrate transactions between threads • Log buffer is source of contention • Log orders incoming requests, not threads • Log records can be combined No! Aether: uncompromised, yet scalable logging

  6. Agenda • Logging-related problems • Aether logging • Reducing lock contention • Reducing context switching • Scalable log buffer implementation • Conclusions

  7. Log Mgr. Working Lock Mgr. Bottleneck 1: Amplified lock contention Done! Commit Xct 1 Xct 2 I/O Waiting Other transactions wait for locks while the log flush I/O completes

  8. Early Lock Release in case of a single log • Finish transaction • Release locks before commit • Insert transaction commit record • Wait until log record is flushed • Dependent xct serialized at the log buffer • No extra overhead, idea around for 30 years …but nobody uses it so far… With ELR other transactions do not wait for locks held during log flushes

  9. ELR benefits Sun Niagara T2 (64 HW contexts), 64GB RAM Mem. resident TPC-B in Shore-MT Zipfian distribution on transaction inputs ELR is simple and sometimes very useful

  10. Agenda • Logging-related problems • Aether logging • Reducing lock contention • Reducing context switching • Scalable log buffer implementation • Conclusions

  11. Log Mgr. Working Bottleneck 2: Excessive context switching Sun Niagara T2 (64 HW contexts) Mem. resident TPC-B in Shore-MT Time Commit Xct 1 Xct 2 Context switch I/O Waiting • One context switch per log flush  Pressure on the OS scheduler Must decouple thread scheduling from log flushes

  12. Flush Pipelining • Scheduler in the critical path andwastesCPU • Multi-core HW only amplifies the problem • But, transaction nearly stateless at commit • Detach transaction state from worker thread • Pass it to log writer • Worker threads do not block at commit time Xct 2 Time Thread 1 Xct 1 Thread 2

  13. Flush Pipelining • Scheduler in the critical path andwastesCPU • Multi-core HW only amplifies the problem • But, transaction nearly stateless at commit • Detach transaction state from worker thread • Pass it to log writer • Worker threads do not block at commit time Log Writer Time Thread 1 Xct 1 Xct3 Staged-like mechanism = low scheduling costs Xct 2 Thread 2 Xct4

  14. Impact of Flush Pipelining Sun Niagara T2 (64 HW contexts) Mem. resident TPC-B in Shore-MT Match Asynchronous Commit throughput without compromising durability

  15. Agenda • Logging-related problems • Aether logging • Reducing lock contention • Reducing context switching • Scalable log buffer implementation • Conclusions

  16. Log Mgr. Working Bottleneck 3: Log buffer contention • Centralized log buffer  Contention, which depends on • participating number of threads • size of modifications (kiB in case of physical logging) Time Xct 1 Xct 2 Xct3 I/O Waiting Log Buffer Latch Waiting

  17. Eliminating critical sections • Inspiration: elimination-based backoff* • Critical sections can cancel each other out • E.g., stack push/pop operations push() • Attempt to acquire mutex • If failed, backoff waiting on a array • If someone else already waits there, eliminate requests w/o acquiring mutex pop() push() Stack Station area Adapt elimination-based backoff for db logging • * D. Hendler, N. Shavit, and L. Yerushalmi. “A Scalable Lock-free Stack Algorithm.” In Proc. SPAA, 2004

  18. Accessing the log buffer • Break log insert into three logical steps (a) Reserve space by updating head LSN (b) Copy log record (memcpy) (c) Make insert visible by updating tail LSN, in LSN order • Steps (a) + (c) can be consolidated • Accumulate requests off the critical path • Send only group leader to fight for the critical section • Move (b) out of critical section (c) (a) (b)

  19. Design evolution contention(# threads) = O(1) (B) Baseline (B) Baseline Consolidation array (C) Decouple contention from the # of threads and average log entry size (D) Decoupled buffer insert (D) Decoupled buffer insert Hybrid design (CD) Hybrid design (CD) contention(work) = O(1) Mutex held Start/finish Waiting Copy into buffer

  20. Performance as contention increases Microbenchmark Bimodal distribution 48B and 160B 120B average Hybrid solution combines benefits of both

  21. Sensitivity to slot count 60 Colors/height is throughput (in MB/s) 50 1700 1600 40 1400 30 # Threads 1200 1000 20 800 10 400 0 1 2 3 4 5 6 7 8 9 10 # Slots Relatively insensitive to slot count (3 or 4 slots good enough for most cases)

  22. Case against distributed logging • Distributing TPC-C log records over 8 logs • 1 ms wall time, ~200 in flight transactions, 30 commits • Horizontal blue line = 1 log • Diagonal line = dependency (new = black, older = grey) Large overhead keeping track dependencies and over-flushing

  23. Agenda • Logging-related problems • Aether logging • Reducing context switching • Scalable log buffer implementation • Conclusions

  24. Putting it all together Sun Niagara T2 (64 HW contexts) Mem. Resident, TPC-B Gap increases w/ # threads! +15% +60% from Baseline Eliminate current log bottlenecks Future-proof system against contention

  25. Conclusions • Logging is an essential component for OLTP • Simplifies recovery, improves performance without the need of physically partitioning data .. but need to address all lurking bottlenecks • Aether is a holistic approach to logging • Leverages existing techniques (Early lock release) • Reduces context switches (Flush Pipelining) • Eliminates log contention (Consolidation-based backoff) • Can achieve 2GB/s of log throughput per node Thank you!

More Related