320 likes | 484 Views
Transactional Flash V. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008. Shimin Chen Big Data Reading Group. Introduction. SSD: block-level APIs as disks Lost of opportunity
E N D
Transactional FlashV. Prabhakaran, T. L. Rodeheffer, L. Zhou (MSR, Silicon Valley), OSDI 2008 Shimin Chen Big Data Reading Group
Introduction • SSD: block-level APIs as disks • Lost of opportunity • Goal: new abstractions for better matching the nature of the new medium as well as the need from file systems and databases
Idea: Transactional Flash (Txflash) • An SSD (w/ new features) • Addressing: a linear array of pages • Support read and write operations • Support a simple transactional construct • Each tranx consists of a series of write operations • Atomicity • Isolation • Durability
Why is this useful? • Transaction abstraction required in many places: file system journals, etc. • Each application implements its own • Complexity • Redundant work • Reliability of the implementation • Great if a storage layer provides transactional API
Previous Work: disk-based • Copy-on-Write + Logging • Fragmentation poor read performance • Checkpointing and cleaning • Cleaning cost • SSDs mitigate these problems • SSDs already do CoW for flash-related reasons • Random read accesses are fast
Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion
In-progress tranx Core of TxFlash Not issue conflict writes s TxFlash Architecture & API WriteAtomic(p1…pn) p1…pn are in a tranx followed by write(p1)…write(pn) atomicity, isolation, durability Abort aborting in-progress tranx
Simple Interface • WriteAtomic: multi-page writes • Useful for file systems • Not full-fledged tranx: no reads in tranx • Reduce complexity • Backward compatible
Flash is good for this purpose • Copy-on-write: already supported by FTL • Fast random reads • High concurrency • multiple flash chips inside • New device: • New interface more likely
Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion
Traditional Commit • First write to a log: • Intention record: (data, page# & version#, tranx ID) • … • Intention record • Commit record • Tranx is committed == commit record exists • Intention records modify original data • If modifications are done, the records can be garbage collected
Traditional Commit on SSDs • Optimizations: • All writes can be issued in parallel • Not update the original data, just update the remap table • Problem: commit record • Extra latency after other writes • Garbage collection is complicated: • Must know if all the updates complete or not
New Proposal (1): Simple Cyclic Commit • No commit record • Intension records of the same tranx use next links to form a cycle • (data, page# & version#, next page# & version#) • Tranx is committed == all intension records are written • Flash page (4KB) + metadata (128B)are co-located
Solution: • Any uncommitted intention on the stable storage must be erased before any new writes are issued to the same or a referenced page
Operations • Initialization: • Setting version# to 0, next-link to self • Transaction • Garbage Collection: • For any uncommitted intention • For committed page if a newer version is committed • Recovery: scan all pages then look for cycles
New Proposal (2):Back Pointer Cyclic Commit • Another way to deal with ambiguity • Intention record: • (data, page#&version#, next-link, link to last committed version)
A3 is a straddler of A2 Some complexity in garbage collection and recovery because of this
Outline • Introduction • The Case for TxFlash • Commit Protocols • Implementation • Evaluation • Conclusion
Implementation • Simulatior • DiskSimtrace-driven SSD simulator (UNIX’08)modifications for TxFlash • Support tranx of maximum size 4MB • Pseudo-device driver for recording traces • TxExt3: • Employ Txflash for Ext3 file system • Tranx: Ext3 journal commit
Experimental Setup • TxFlash device: • 32GB: 8x 4GB flash packages • 4 I/O operations within every flash package • 15% of space reserved for garbage collection • Workload on top of Ext3: • IOzone: micro benchmark (no sync writes) • Linux-build (no sync writes) • Maildir (sync writes) • TPC-B: simulate 10,000 credit-debit-like operations on TxExt3 file system (sync writes) • Synthetic workloads
Unlike database logging, large tranx sizes: no sync; data are included
TxFlash vs. SSD • Remove WriteAtomic from traces • Use SSD simulator • SSD does not provide any transaction guarantees (so should have better performance)
Space comparison: TxFlash needs 25% of more main memory than SSD • 4+1 MB per 4GB flash 40 MB for the 32GB TxFlash device
End-to-end performance • TxFlash: • Run pseudo-device driver on real SSD • The performance is close to that of TxFlash • Ext3: • Use SSD as journal • SSD cache is disabled in both cases
Summary • TxFlash: • Adding transaction interface in SSD • Cyclic commit protocols • Nice solution for file system journaling