190 likes | 252 Views
Highly Available ACID Memory. Vijayshankar Raman. Introduction. Why ACID memory? non-database apps: want updates to critical data to be atomic and persistent synchronization useful when multiple threads are accessing critical data databases
E N D
Highly Available ACID Memory Vijayshankar Raman
Introduction • Why ACID memory? • non-database apps: • want updates to critical data to be atomic and persistent • synchronization useful when multiple threads are accessing critical data • databases • concurrency control and recovery logic runs through most of database code. • Extremely complicated, and hard to get right • bugs lead to data loss -- disastrous!
Project goal • Take recovery logic out of apps • Build a simple user-level library that provides recoverable, transactional memory. • all the logic in one place => easy to debug, maintain • easy to to make use of hardware advances • use replication and persistent memory for recovery -- instead of writing logs • simpler to implement • simpler for applications to use ??
Questions to answer • program simplicity vs. performance • how much do we lose by replicating instead of logging? • on a cluster, can we use replication directly for availability? • traditionally availability handled on top of the recovery system
Outline • Introduction • Acid Memory API • Single Node design & implementation • Evaluation • High Availability: multiple node design and implementation • Evaluation • Conclusion
Acid Memory API • Transaction manager interface • TransactionManager(database name, acid memory area) • Transaction interface • beginTransaction() • getLock(memory region1, READ/WRITE) • getLock(memory region2, READ/WRITE) • ... • memory region = virtual address prefix • commit/abort() -- all locks released • combine concurrency control with recovery • recovery done on write-locked regions • supports fine granularity locking=> cannot use VM for recovery • applications can modify data directly
Implementation Acid memory area master copy Disk file • assume non-volatile memory (NVRAM, battery backup) • assume persistent file cache • acid memory area mmap’d from file • persistence => writes are permanent • getLock(WRITE) -- copy the region onto mirror area • transaction abort / system crash • undo changes on all writelocked regions using copy in mirror area • only overhead of recovery is a memcpy on each write lock mmap mirror
Evaluation • Overhead of acid memory • read lock: 35usec (lock manager overhead) • write lock: 35usec + 5.5usec/KB (memcpy cost) • much lesser than methods that write log to disk • Ease of programming • application needs to only acquire locks to become recoverable • can manipulate the data directly -- do not have to call special function on every update
Example: suppose I want to transfer 1M $ from A’s account to B’s With ACID memory /* a points to A’s account */ /* b points to B’s account */ trans = new Transaction(transMgr); trans->getLock(a, WRITE); trans->getLock(b, WRITE); a = a - 1000000; b = b + 1000000; trans->commit(); (Update() creates the needed logs) Using logging BeginTransaction(); getLock(A’s account, WRITE); getLock(B’s account, WRITE); read(A’s account, a); read(B’s account, b); a = a - 1000000; b = b + 1000000; Update(A’s account, a); Update(B’s account, b); commit();
Acid memory: write-lock the data-structure Logging: write-lock the structure and update each integer separately • Performance comparison: acid memory vs. logging • consider a transaction updating integers in a 1KB data-structure • logging each individual update is a bit faster, to an extent • acid memory gives okay performance with very easy programmability Time (in microseconds) Number of integer writes
Outline • Introduction • Acid Memory API • Single Node design & implementation • Evaluation • High Availability: multiple node design and implementation • Evaluation • Conclusion
Replication for availability Transaction processing monitor replicate • traditionally, availability has been handled in a separate layer -- above recovery • can we handle both recovery and availability via same mechanism? DBMS DBMS DBMS
lock manager Architecture replicas Owner data data data data data client • Transactions run by transaction handler • all lock requests must go to owner • data in all replicas must be kept in sync • balance load by partitioning data • different owner for each partition • failure model • fail-stop: nodes never send incorrect messages to others • failed nodes never recover data after crash • network never fails Transaction handler
Owner data lock manager data data data data client Transaction handler • Reads: client gets data from random replica • Writes: must update all replicas • on commit, transaction sends new data to owner • owner propagates update atomically to all replicas • 3 phase non-blocking commit protocol. Always ensure that there is someone to take over the propagation if you crash • if owner crashes, fail-over to a replica
Evaluation • Very fast recovery -- 424 usecs • get fast transactions without non-volatile memory • writes are slower • 4n messages at commit if n replicas • still, this is faster than logging to disk • homogeneous software: susceptible to bugs
Conclusions • Acid memory easier to use • Performance relative to logging not too bad • replication gives fast recovery • Using cache for replication • when/how much to replicate? Future Work
Evaluation, w.r.t. logging based approach • Ease of implementation • very little to code, mostly lock manager stuff • whereas in a traditional dbms • specialized buffer manager • log manager • complex recovery mechanism
How to make file cache persistent • Rio (Chen et. Al, 1996) • place file cache in non-volatile memory • protect it against OS crashes using VM protection • flush pages in file cache to disk files on reboot