180 likes | 376 Views
Data Indexing for Stateful , Large-scale Data Processing. Dionysios Logothetis, Kenneth Yocum University of California, San Diego. Processing large-scale data today. MapReduce (Dryad/ Hadoop ) Scalable/fault tolerant bulk-data processing
E N D
Data Indexing for Stateful, Large-scale Data Processing Dionysios Logothetis, Kenneth Yocum University of California, San Diego
Processing large-scale data today MapReduce (Dryad/Hadoop) • Scalable/fault tolerant bulk-data processing • Groupwise processing: embarrassingly parallel workloads • General: supports relational queries too, e.g. join two datasets Parallel DBs • 20 years of work • Fast and efficient for joins [Pavlo et al., SIGMOD ’09] Really two philosophies • DBs: structured data that is preloaded • Allows indexing • MapReduce: grab data, process, use the result • Indexing may be wasteful
The case for stateful bulk processing Incremental processing • On bulk data, continuously arriving, e.g. web crawls • State of the art is to recompute when data changes • Grossly inefficient Key idea: Incorporating state in bulk-processing Challenge: efficient statefulgroupwise processing • Incorporate into programming model • Efficient architecture, fast access to state
Bulk-incremental Processing Systems (BIPS) • Supports stateful computations • User-defined function G(∙) • Multiple input and output flows • Access to persistent state • Modeled as a loopback flow • Input and state grouped by key • G(∙) called for every key Single processing stage k v k v Input records … State Fin2 Fin1 G(key, Fstate, ΔF1, ΔF2) Fin state Fout2 Fout1 k1 v1 k2 v2 Fout state … kn-1 vn-1 kn vn
Statefulgroupwise processing state flow input flow key Input flow Count 2 2 1 G(green, Ø , ) State G(blue , Ø , ) G(red , Ø , )
Statefulgroupwise processing state flow input flow key Input flow Count 2 2 1 G(green, Ø , ) State G(blue , Ø , ) G(red , Ø , ) Input flow Count G(green, , Ø ) 1 G(blue , , ) 2 2 1 2 2 3 1 G(red , , Ø ) State 2
Inner-grouping with state • Current models: • Support only outer-grouping • Read whole state, call G() for every key • BIPS model allows inner-grouping • Call G() only if there is an input key matching • Use input to select what state to update Input flow Count G(blue, , ) 2 2 1 2 3 State
Storing state in tables Count • Storing state to a file • Forces stage to read whole state • HDFS / GFS • Maintain indexed state • Selectively access based on input • Avoid unnecessary data transfers • Bigtable[Chang et al., OSDI’06] • Stages store state in table • Indexed by state key 2 2 3 2 1 1 File Randomly reading part of the state must be faster than sequentially reading the whole state Count 2 2 2 3 1 Table
BIPS prototype • Leverage Hadoop • Modify to support • Statefulgroupwise processing • Inner-grouping • … and others • Hypertable for storing state • Open source “Bigtable”
Using table-based storage • What workloads benefit from the index? • Incremental count, 1M state records, store state on HDFS or Hypertable • Break-even at 17% Index helps only a small range of workloads
Predicting the benefit of an index • What type of workloads benefit more? • What is the random read rate required? • Simple cost model: • Running time T of an operator • Depends on • Random-to-sequential read rate ratio: Rran /Rseq • % of state accessed: h • T = Tread,I + Tsort,I + Tread,S + Twrite,S Read input Sort input Read state Write state Time to randomly access h∙N records Time to sequentially access N records …or…
Predicting the benefit of an index • Fix random-to-sequential throughput • What’s the maximum % of state accessed for which there is a gain? Break-even 50% gain • BT helps if less than 20% of state accessed More workloads benefit BT 20% Random reads become as fast as sequential
Leveraging Solid State Disks • Table stores are built on top of magnetic disks • Random read rate 1 order of magnitude lower than sequential • SSDs improve random read performance • 200x higher than magnetic disks • Good candidate for serving indexes
Random state access on an SSD • Developed proof-of-concept indexed storage system • Increased random-to-sequential read ratio to 37% Break-even 50% gain • Break-even at 65% state accessed • SSD raw performance leaves room for wider range of workloads 65% More workloads benefit BT 20% SSD SSD raw Random reads become as fast as sequential
SSD cost efficiency • SSDs are good for implementing indexes… but they are expensive • Cost per capacity ($/GB) is high • 30 times higher than magnetic disks • Cost per bandwidth ($/MB/s) is low • 50 times lower than magnetic disks • Cost efficiency: cost per performance • Cost: price per capacity, C • Performance: job throughput 1/Tread,S • System I: • Runs on an HD • Sequentially accesses N records • System II: • Runs on an SSD • Randomly accesses h∙N records CHD CSSD = Rseq,HD Rran,SSD/h
SSD cost efficiency • Cost efficiency depends on • Cost ratio • % of state accessed • Random read rate CHD CSSD = Rseq,HD Rran,SSD/h Required random read rate so that SSD is more cost-efficient Relative cost today: 30x Cost-efficient for <5% state accessed Relative cost in 5 years: 2x FusionIO Cost-efficient for <70% state accessed
Summary • DBs use indexes to speedup operations • Bulk-incremental processing can benefit too • Model for stateful bulk processing • Allows the use of indexes • Table stores on magnetic disks do not perform well • Leverage SSDs for better random reads