290 likes | 408 Views
SIGMOD’07. Design of Flash-Based DBMS: An In-Page Logging Approach. Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon, Korea 440-746 wonlee@ece.skku.ac.kr. Bongki Moon Department of Computer Science University of Arizona Tucson, AZ 85721, U.S.A. bkmoon@cs.arizona.edu.
E N D
SIGMOD’07 Design of Flash-Based DBMS: An In-Page Logging Approach Sang-Won Lee School of Info & Comm Eng Sungkyunkwan University Suwon, Korea 440-746 wonlee@ece.skku.ac.kr Bongki Moon Department of Computer Science University of Arizona Tucson, AZ 85721, U.S.A. bkmoon@cs.arizona.edu COMPUTER SCIENCE DEPARTMENT
Outline Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach Reviews COMPUTER SCIENCE DEPARTMENT
Flash Memory Flash memory is a type of electrically-erasable programmable read-only memory (EEPROM) Page is the unit of read and write operations Typical value: 2KB Write operation can only clear bits (change their value from 1 to 0). The only way to change value from 0 to 1 is erase an entire region memory. This region has fixed-size, called erase units, erase block or just block. Typical value: 128KB for large flash memory COMPUTER SCIENCE DEPARTMENT
Characteristics of Flash No in-place update the data item need to be erased first before writing it again. An erase unit (16KB or 128 KB) is much larger than a sector (512 bytes). No mechanical latency Flash memory is an electronic device without moving parts Provides uniform random access speed without seek/rotational latency Asymmetric read & write speed Read speed is typically at least twice faster than write speed Write (and erase) optimization is critical COMPUTER SCIENCE DEPARTMENT
Magnetic Disk vs Flash Memory Magnetic Disk : Seagate Barracuda 7200.7 ST380011A NAND Flash : Samsung K9WAG08U1A 16 Gbits SLC NAND Unit of read/write: 2KB, Unit of erase: 128KB COMPUTER SCIENCE DEPARTMENT
Category of Flash Memory NAND Vs. NOR Flash NOR: high erase cost (several seconds), directly addressable (access is by bit or byte) NAND: relative low erase cost (several ms), access is by pages MLC Vs. SLC NAND Flash MLC (Multiple Level Cell): it stores multiple bits per cell, but significantly slower read and write speeds; 10x lower read/write lifetime SLC (Single Level Cell): it stores only one single bit per cell SLC flash has much better performance, lifetime, and reliability properties than MLC COMPUTER SCIENCE DEPARTMENT
Small flash Vs Large flash Small flash memory was widely used for PDA, MP3, mobile phone, sensor network… Advantages: size, weight, shock resistance, power consumption, noise … Typical size: a few gigabytes Recently, some vendors developed large flash memory called Flash SSD (Solid State Disk) Mainly used for notebook PC. Apple AirBook / Thinkpad X300 Typical size: > 16G COMPUTER SCIENCE DEPARTMENT
Different FTLs for Large and Small Flash Memory Page-mapping FTL (used for small flash memory) maintains the mapping information between the logical page and the physical page separately Log-structured achitecture Large memory for its mapping information must be reconstructed by scanning the whole flash memory at start-up, and this may result in long mount time Block-mapping FTL Small memory for its mapping information Any update causes a whole block rewrite (that is why random writes are so slow!) In real production, there are some optimizations for improving concentrated updates COMPUTER SCIENCE DEPARTMENT
Flash memory for server application More recently, because of the advantages of flash memory and the increasing capacity, there is a new trend that use large flash memory for database server application Jim Gray said: Tape is Dead,Disk is Tape,Flash is Disk! COMPUTER SCIENCE DEPARTMENT
Outline Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach My reviews COMPUTER SCIENCE DEPARTMENT
Disk-Based DBMS on Flash Memory What happens if disk-based DBMS runs on Flash memory? Due to No In-place Update, it writes the whole block into another clean block Consume free blocks quickly causing frequent garbage collection and erase SQL: Update / Insert / Delete Update Buffer Mgr. Page : 4KB Erase Unit: 128KB Dirty Block Write Flash Memory Data Block Area COMPUTER SCIENCE DEPARTMENT
Disk-Based DBMS Performance Run SQL queries on a commercial DBMS Sequential scan or update of a table Non-sequential read or update of a table (via B-tree index) Experimental settings Storage: Magnetic disk vs M-Tron SSD (Samsung flash chip) Data page of 8KB 10 tuples per page, 640,000 tuples in a table (64,000 pages, 512MB) COMPUTER SCIENCE DEPARTMENT
Disk-Based DBMS Performance Read performance : The result is not surprising at all • Hard disk • Read performance is poor for non-sequential accesses, mainly because of seek and rotational latency • Flash memory • Read performance is insensitive to access patterns COMPUTER SCIENCE DEPARTMENT
Disk-Based DBMS Performance Write performance • Hard disk • Write performance is poor for non-sequential accesses, mainly because of seek and rotational latency • Flash memory • Write performance is poor (worse than disk) for non-sequential accesses due to out-of-place update and erase operations • Demonstrate the need of write optimization for DBMS running on Flash COMPUTER SCIENCE DEPARTMENT
Outline Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach My reviews COMPUTER SCIENCE DEPARTMENT
In-Page Logging (IPL) Approach Design Principles Take advantage of the characteristics of flash memory Fast read speed Overcome the “erase-before-write” limitation of flash memory Minimize the changes to the DBMS architecture Limited to buffer manager and storage manager COMPUTER SCIENCE DEPARTMENT
Design of the IPL Logging on Per-Page basis in both Memory and Flash • An In-memory log sector can be associated with a buffer frame in memory • Allocated on demand when a page becomes dirty • An In-flash log segment is allocated in each erase unit update-in-place Database Buffer in-memory data page (8KB) in-memory log sector (512B) Flash Memory 15 data pages (8KB each) Erase unit: 128KB log area (8KB): 16 sectors …. …. The log area is shared by all the data pages in an erase unit COMPUTER SCIENCE DEPARTMENT
IPL Write • Whenever an update is performed on a data page, the in-memory copy of data page is updated immediately. In addition, IPL buffer manager adds a log record to the in-memory log sector • When a dirty page is evicted by replacement policy or the in-memory log sector is full, • the content of data page is not written to flash memory. • Instead, In-memory log sector is written to the in-flash log segment update-in-place Update / Insert / Delete Sector : 512B physiological log Page : 8KB Buffer Mgr. Block : 128KB Flash Memory Data Block Area COMPUTER SCIENCE DEPARTMENT
IPL Read When a page is read from flash, the current version is computed on the fly Pi Apply the “physiological action” to the copy read from Flash (CPU overhead) Buffer Mgr. Re-construct the current in-memory copy Read from Flash • Original copy of Pi • All log records belonging to Pi(IO overhead) data area (120KB): 15 pages Flash Memory log area (8KB): 16 sectors COMPUTER SCIENCE DEPARTMENT
IPL Merge When all free log sectors in an erase unit are consumed Log records are applied to the corresponding data pages The current data pages are copied into a new erase unit Merge 15 up-to-date data pages A Physical Flash Block log area (8KB): 16 sectors clean log area Bold Bnew COMPUTER SCIENCE DEPARTMENT
Why IPL can improve write performance of DBMS? The number of disk writes doesn’t decrease Actually, #writes may increase because: It introduces excess disk writes if the log sector is full The merge operation introduces overhead Then why can IPL improve write performance? IPL overcomes the erase-before-write property of flash Reduces the number of erasures COMPUTER SCIENCE DEPARTMENT
IPL Simulation with TPC-C TPC-C Log Data Generation Run a commercial DBMS to generate reference streams of TPC-C benchmark HammerOra utility used for TPC-C workload generation Each trace contains log records of physiological updates as well as physical page writes Average length of a log record: 20 ~ 50B TPC-C Traces 100M.20M.10u: 100MB DB, 20 MB buffer, 10 simulated users 1G.20M.100u: 1GB DB, 20 MB buffer, 100 simulated users 1G.40M.100u: 1GB DB, 40 MB buffer, 100 simulated users Parameter setting Write (2KB): 200 us Merge (128KB): 20 ms COMPUTER SCIENCE DEPARTMENT
Log Segment Size vs Merges TPC-C Write frequencies are highly skewed (and low temporal locality) Erase units containing hot pages consume log sectors quickly Could cause a large number of erase operations More storage but less frequent merges with more log sectors COMPUTER SCIENCE DEPARTMENT
Estimated Write Performance Performance trend with varying buffer sizes The size of log segment was fixed at 8KB Estimated write time With IPL = (# of sector writes)× 200us + (# of merges) × 20ms Without IPL = × (# of page writes) × 20ms is the probability that a page write causes erase operation COMPUTER SCIENCE DEPARTMENT
Support for Recovery IPL helps realize a lean recovery mechanism Additional logging: transaction log and list of dirty pages Transaction Commit Similarly to flushing log tail An in-memory log sector is forced out to flash if it contains at least one log record of a committing transaction No explicit REDO action required at system restart Transaction Abort De-apply the log records of an aborting transaction Use selective merge instead of regular merge, because it’s irreversible If committed, merge the log record If aborted, discard the log record If active, carry over the log record to a new erase unit To avoid a thrashing behavior, allow an erase unit to have overflow log sectors No explicit UNDO action required COMPUTER SCIENCE DEPARTMENT
Conclusion Clear and present evidence that Flash can replace Disk IPL approach demonstrates its potential for TPC-C type database applications by Overcoming the “erase-before-write” limitation Exploiting the fast and uniform random access IPL also helps realize a lean recovery mechanism COMPUTER SCIENCE DEPARTMENT
Outline Flash memory Disk-Based DBMS on Flash Memory Flash-Based DBMS: In-Paging Logging approach Reviews COMPUTER SCIENCE DEPARTMENT
Reviews IPL hurts read performance For each read operation, it has to read data page and log sector page Read performance will be about 2X slower No Experiment Result The authors only give the result through the I/O access simulation Simulation The data size of simulation is too small (1G). Didn’t show the overall performance of TPC-C. (most operations in TPC-C are read operations) COMPUTER SCIENCE DEPARTMENT
Any Questions? Q & A COMPUTER SCIENCE DEPARTMENT