210 likes | 388 Views
The end of an architectural era: (it’s time for a complete rewrite). M. Stonebraker , S. Madden, D. J. Abadi , S. Harizopoulos , N. Hachem , and P. Helland VLDB, 2007. Presented by: Suprio Ray. The I/O Gap. Disk capacity doubles every 18 months. The I/O Gap.
E N D
The end of an architectural era: (it’s time for a complete rewrite) M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland VLDB, 2007 Presented by: Suprio Ray
The I/O Gap Disk capacity doubles every 18 months
The I/O Gap • Disk capacity doubles every 18 months • Memory size doubles every 18 months • Disk bandwidth doubles every 10 years (R. Feritas et. al. FAST, 2008) • Memory (latency) is ~6000 times faster than disk
The I/O Gap • Disk capacity doubles every 18 months • Memory size doubles every 18 months • Disk bandwidth doubles every 10 years (R. Feritas et. al. FAST, 2008) • Avoid accessing disk (if possible)
One size does not fit all • OLTP • Amazon : 42 TB • Typical: less than a TB • Data Warehouse • Yahoo : 2 PB • Ebay: 1.4 PB • Search engines (text) • Google : 850 TB • Scientific • US Department of Energy (NERSC): 3.5 PB • Stream processing
One size does not fit all Goal: Build a custom, high performance OLTP database • OLTP • Amazon : 42 TB • Typical: less than a TB • Data Warehouse • Yahoo : 2 PB • Ebay: 1.4 PB • Search engines (text) • Google : 850 TB • Scientific • US Department of Energy (NERSC): 3.5 PB • Stream processing
Overview Motivation OLTP overheads System architecture Transaction management Evaluation Conclusion and discussion
Database System Architecture Query Processing Transaction Management SQL query Calls from Transactions (read,write) Parser Transaction Manager relational algebra Query Rewriter and Optimizer Statistics & Catalogs & System Data Lock Table Concurrency Controller query execution plan Recovery Manager Execution Engine Buffer Manager Log Data + Indexes
OLTP Overheads • Logging • Must be written to disk for durability • Locking - To read or write a record • Latching - Updates to shared data structure • Buffer management - Cache disk pages in memory
H-Store system architecture • Shared-nothing, main-memory, row-store relational database • Node • hosts 1 or more sites • Site • single threaded • one site per core • Relation • divided into one or more partitions or • cloned • Partition • replicated and hosted on multiple sites
Runtime model • Stored procedure interface for transaction • Unique name • Control and SQL commands • SQL command execution • annotate the exec plan • passed to Transaction mgr • plans are transmitted • results passed back to initiator
System deployment • Cluster deployment framework (CDF) accepts • a set of stored procedure • database schema • sample workload • available sites • CDF produces • a set of compiled stored procedure • physical DB layout
Transaction variants • Single-sited • All queries can be executed on just one node • One-shot • Individual queries can be executed on single nodes • Two-phase • Phase 2 can be executed without integrity violation • Strongly two-phase • Either all replicas continue or all abort • Sterile • Order of execution doesn’t matter
Transaction management • Replica synchronization • Read any replica; update all replicas • Transaction ordering • Each transaction is timestamped • Concurrency control considerations • OLTP transactions are very short-lived • Single threaded execution avoids page latching • Not needed for some transaction classes (single-sited/one shot/sterile)
Concurrency control strategy • Basic strategy • Wait for a small time for conflicting transactions with lower timestamp • If none found, execute the subplan and send result • Else, issue an abort • Intermediate strategy • Wait for a length of time approximated by MaxD * average_round_trip_message_delay • Advanced strategy • If needed, abort a transaction using Optimistic CC rules
Evaluation – experimental setup • Benchmark: a variant of TPC-C • all transaction classes made one-shot and strongly two-phased • all transaction classes implemented as stored procedures • Databases • H-Store • a popular commercial RDBMS, X • Hardware • Dual-core 2.8GHz system • 4GB RAM • 4 x 250 GB SATA disk drives
Evaluation – results • Metric: Transactions/second per core • H-Store 82 times faster than X * performance record published by TPC-C
H-Store limitations • The database must fit into the available memory • A cluster-wide power failure to cause the loss of committed transactions • A limited subset of SQL '99 is supported • DDL operations like ALTER and DROP aren't supported • Challenging operations model • Changing the schema or reconfiguring hardware requires first saving and shutting down the system • No WAN support (single data-center) • In case of a network partition, some queries will not execute
Conclusion Demise of general purpose database (prediction) H-Store is a custom, main-memory database optimized for OLTP H-Store shows significant performance advantage over a popular relational database
Discussion • Raw speed vs. ease of use • Limited DDL support, changing schema/node requires reboot • “Separation of concern” • Is it a good idea to embed appl. logic in stored procedure? • Custom vs. general purpose query language • SQL to be replaced with Ruby-on-Rails ? • No WAN support: single data-center assumption • CAP theorem • Catastrophic failure scenario