Hash in a Flash: Hash Tables for Solid State Devices

Hash in a Flash:Hash Tables for Solid State Devices S M Faisal* ShirishTatikonda‡ Tyler Clemons* CharuAggarwal† SrinivasanParthasarathy* *The Ohio State University. Columbus, Ohio ‡IBM Almaden Research Center. San Jose, California †IBM T.J. Watson Center. Yorktown Heights, New York

Motivation and Introduction • Data is growing at a fastpace • Scientific data, Twitter, Facebook, Wikipedia, WWW • Traditional Data Mining and IR algorithms require random out-of-core data access • Often Data is too large to fit in memory thus frequent random disk access is expected

Motivation and Introduction (2) • Traditional Hard Disk Drives can keep pace with storage requirements but NOT random accessworkloads • Moving parts are physicallimitations • Also contribute to rising energy consumption • Flash Devices have emerged as an alternative • Lack moving parts • Faster Random Access • Lower energy usage • But they have several drawbacks….

Flash Devices • Limited Lifetime • Supports limited number of rewrites • Also known as erasures or cleans. • Impacts response time • These are incurred at the block level. • Blocks consist of pages. Pages (4kb-8kb) are the smallest I/O unit • Poor Random Write Performance • Incurs many erasures and lowers lifetime • Efficient sequential write performance • Lowers erasures and increases lifetime

On Flash Devices, DM, and IR • Flash Devices provide fast random read access • Common for many IR and DM algorithms and data structures • Hash Tables are common in both DM and IR • Useful for associating keys and values • Counting Hash Tables associate keys with a frequency • This is found in many algorithms that track word frequency • We will examine one such algorithm common in both DM and IR (TF-IDF) • They exhibit random access for writes and reads • Random Writes are an issue for Flash Devices

Hash Tables for Flash Devices must: • Reduce erasures/cleans and Reduce random writes to SSD • Batch updates • Maintain reasonable query times • Data Structure must not incur unreasonable disk overhead • Nor should it require unreasonable memory restraints

Our approach • Our approach makes two key contributions: • Optimize our designs for a counting hash table. • This has not been done by the previous approaches • (A. Anand ’10), (D. Andersen ’09) , (B. Debnath, ’10) , (D. Zelinalipour-Yatzi ’05) • The Primary Hash Table resides on the Flash Device. • Many designs use the SSD as a cache to the HDD • (D. Andersen ’09) (B. Debnath, ’10) • Anticipate data sets with high random access and throughout requirements

Hash Tables for Flash Devices must: • Reduce erasures/cleans and Reduce random writes to SSD • Batch updates • Create In Memory Structure • Target semi-random updates or block level updates • Maintain reasonable query times • Data Structure must not incur unreasonable disk overhead • Carefully index keys on disk • Nor should it require unreasonable memory restraints • Memory requirement is at most fixed parameter

Memory Bounded(MB) Buffering (64,2) (12,7) Updates are quickly combined in memory Updates are Hashed into a bucket in the RAM When full, batch updates to corresponding Disk Buckets If Disk Buckets are full, invoke overflow region

Memory Bounded(MB) Buffering • Two way Hash • On-Disk Closed Hash Table • Hash at page level • Update via block level • Linear Probing for collisions • In memory Open Hash table • Hash at block level • Combine updates • Flush with merge() operation • Overflow segment • Closed Hash table excess

Can we improve MB? • Reduces number of write operations to flash device • Batch Updates only when memory buffer is full • Updates are semi-random • (Key,Value) changes are maintained in memory • Query times are reasonable • Memory buffer search is fast • Relatively fast SSD random access and linear probing (See Paper) • Prefetch pages • MB has disadvantages • Sequential Page Level operations are preferred • Fewer block updates • Limited by the amount of available memory • Think large disk datasets. • Updates may be numerous

Introduce an On Disk Buffer • Batch updates from memory to disk are page level • Reduce expensive block level writes (time and cleans) • Increase Sequential writes • Increase buffering capability • Reduce expensive non semi-random Block Updates • May decrease cleans • Search space increases during queries • Incurred only if inserting and reading concurrently • However, less erasure time will decrease latency

On Disk Buffering • Change Segment (CS) • Sequential Log Structure • sequential writes • stage() operation • Flushes memory to CS • Fast Page Level Operations • merge() operation • Invoked when CS is full • Combines CS with Data Segment • Less frequent than stage() • What is the structure of the CS?

Change Segment Structure v1 Buckets are assigned specific Change Segment Buckets. Change Segment Buckets are shared by multiple RAM buffer buckets.

Memory Disk Bounded Buffer (MDB) • Associate a CS block to k data blocks • Semi random writes • Only merge() full CS blocks • Frequently updated blocks may incur numerous (k-1) merge() operations • Query times incur an additional block read • Packed with unwanted data

Change Segment Structure v2 As buckets are flushed, they are written sequentially to the change segment one page at a time

MDB-L • No Partitions in CS • Allows frequently updated blocks to have maximum space • merge() all blocks when CS is full • Potentially expensive • Very infrequent • Queries are supported by pointers • As blocks are staged onto the CS, their pages are recorded for later retrieval • Prefetch

Expectations • MB will incur more cleans than MDB or MDBL • Frequent merge() operation will incur block erasure • MDB and MDBL will incur slightly higher query times • Addition of CS • MDB and MDBL will have superior I/O performance • Most operations are page level • Less erasures  lower latency

Experimental Setup (Application) • TF-IDF • Term Frequency-Inverse Document Frequency • Word importance is highest for infrequent words • Requires a counting hash table • Useful in many data mining and IR applications (document classification and search)

Experimental Setup (DataSets) • 100,000 Random Wikipedia articles • 136M keywords • 9.7M entries • MemeTracker (Aug 2009 dump) • 402M total entries • 17M unique

Experimental Setup (Method) • 1Mrandom queries were issued during insertion phase • 10 random workloads, queries need not be in the table • Measure Query Performance, I/O time, and Cleans • Used three SSD configurations • One Single Level Cell (SLC) vs two Multi Level Cell (MLC) configurations • MLC is more popular. Cheaper per GB but less lifetime • SLC have lower internal error rate, and faster response rates (See Paper for specific configurations) • DiskSimand Microsoft SSD Plugin • Used for benchmarking and fine-tuning our SSD

Results (AVERAGE Query Time) By varying the on memory buffer, as a percentage of the data segment, the average query time only reduces by fractions of a second. This suggest the majority of the query time is incurred by the disk.

Results (AVERAGE Query Time) By varying the on disk buffer, as a percentage of the data segment, the average query time decreasessubstantiallfor MDBL This reduction is seen in both datasets. MDB requires block reads in the CS.

Results (AVERAGE Query Time) Using the Wiki dataset, we compared SLC with MLC We experience consistent performance

Results(AVERAGEI/O) In this experiment, we set the in memory buffer to 5% and the CS to 12.5% of the primary hash table size Simulation time is highest for MB because of the block erasures (next slide). MDBL is faster than MDB because of the increased pagelevel operations

Results(Cleans/Erasures) Cleans are extremely low for both MDB and MDBL relative to MB This is caused by the page level sequential operations Queries are effected by cleans because the SSD must allocate resources to cleaning moving

Discussion and Conclusion • Flash Devices are gaining popularity • Low Latency, High Random Read Performance, Low Energy • Limited lifetime, poor random write performance • Hash tables are useful data structures in many data mining and IR algorithms • They exhibit random write patterns • Challenging for Flash Devices • We have demonstrated that a proper Hash table for Flash Devices will have • In-memory buffer for batch memorydisk updates • On disk data buffer with page level operations

Future work • Our current designs rely on hash functions that use the mod operator • Extendible Hashing • Checkpoint methods for crash recovery • Examine on Real SSD • Disksim is great for finetuning and examining statistics

Questions?

Hash in a Flash: Hash Tables for Solid State Devices