1 / 13

Breaking databases for fun and publications: availability benchmarks

Breaking databases for fun and publications: availability benchmarks. Aaron Brown UC Berkeley ROC Group HPTS 2001. Motivation. Drinking the availability Kool-Aid availability is the key metric for modern apps. Database stack’s availability is especially important

Download Presentation

Breaking databases for fun and publications: availability benchmarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breaking databases for fun and publications: availability benchmarks Aaron Brown UC Berkeley ROC Group HPTS 2001

  2. Motivation • Drinking the availability Kool-Aid • availability is the key metric for modern apps. • Database stack’s availability is especially important • guardians of the world’s hard state • almost any user’s request for electronic information hits a database stack • web services, directories, enterprise apps, ... • Can we trust database software stacks in the face of failure?

  3. Availability benchmarking 101 • Availability benchmarks quantify system behavior under failures, maintenance, recovery • They require • a realistic workload for the system: TPC-C • quality of service metrics: txn rates, OK and aborted • fault-injection to simulate failures: single-disk errors normal behavior(99% conf.) QoS degradation failure Repair Time

  4. sticky uncorrectable write error, log disk Disk hang during write to data disk Well, what happens? • Setup • 3-tier: Microsoft SQLServer/COM+/IIS & bus. logic • TPC-C-like workload; faults injected into DB data & log • Results • DBMS tolerates transient and recoverable failures, reflecting errors back via transaction aborts • middleware highly unstable: degrades or crashes when DBMS fails or undergoes lengthy recovery database fails, middleware degrades middleware causesdegraded performance middlewarecrashes database recovers

  5. Summary • Database is pretty resilient • transaction abort == good error-reflection mechanism • Middleware/applications suck (well, at least this instance of them) • Robustness is end-to-end • user cannot distinguish DBMS and middleware failures • failure recovery must go beyond the DBMS • Achievable Grand Challenges? • build and run availability benchmarks on your systems • tolerate and recover from non-failstop system-level faults Does performance matter?

  6. Backup slides

  7. Experimental setup • Database • Microsoft SQL Server 2000, default configuration • Middleware/front-end software • Microsoft COM+ transaction monitor/coordinator • IIS 5.0 web server with Microsoft’s tpcc.dll HTML terminal interface and business logic • Microsoft BenchCraft remote terminal emulator • TPC-C-like OLTP order-entry workload • 10 warehouses, 100 active users, ~860 MB database • Measured metrics • throughput of correct NewOrder transactions/min • rate of aborted NewOrder transactions (txn/min)

  8. Disk Emulator IDEsystemdisk SCSIsystemdisk SCSIsystemdisk UltraSCSI EmulatedDisk emulatorbacking disk(NTFS) IBM18 GB10k RPM Adaptec2940 AdvStorASC-U2W ASC VirtualSCSI lib. Intel P-II/300128 MB DRAMWindows NT 4.0 = Fast/Wide SCSI bus, 20 MB/sec Experimental setup (2) • Database installed in one of two configurations: • data on emulated disk, log on real (IBM) disk • data on real (IBM) disk, log on emulated disk Front End DB Server Adaptec3940 100mbEthernet MS BenchCraft RTEIIS + MS tpcc.dllMS COM+ IBM18 GB10k RPM SQL Server 2000 AMD K6-2/333128 MB DRAMWindows 2000 AS Intel P-III/450256 MB DRAMWindows 2000 AS DB data/log disks

  9. Results • All results are from single-fault micro-benchmarks • 14 different fault types • injected once for each of data and log partitions • 4 categories of behavior detected 1) normal 2) transient glitch 3) degraded 4) failed

  10. Type 1: normal behavior • System tolerates fault • Demonstrated for all sector-level faults except: • sticky uncorrectable read, data partition • sticky uncorrectable write, log partition

  11. Type 2: transient glitch • One transaction is affected, aborts with error • Subsequent transactions using same data would fail • Demonstrated for one fault only: • sticky uncorrectable read, data partition

  12. Type 3: degraded behavior • DBMS survives error after running log recovery • Middleware partially fails, results in degraded perf. • Demonstrated for one fault only: • sticky uncorrectable write, log partition

  13. Type 4: failure • Example behaviors (10 distinct variants observed) Disk hang during write to data disk Simulated log disk power failure • DBMS hangs or aborts all transactions • Middleware behaves erratically, sometimes crashing • Demonstrated for all fatal disk-level faults • SCSI hangs, disk power failures

More Related