1 / 18

Motivation: Increasing Error Rates

Resiliency-Aware Data Management Matthias Boehm 1 Wolfgang Lehner 1 Christof Fetzer 2 TU Dresden 1 Database Technology Group 2 Systems Engineering Group August 30, 2011. Motivation: Increasing Error Rates. Increasing Component Error Rates

toril
Download Presentation

Motivation: Increasing Error Rates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resiliency-Aware Data ManagementMatthias Boehm1 Wolfgang Lehner1 Christof Fetzer2TU Dresden 1 Database Technology Group2 Systems Engineering GroupAugust 30, 2011

  2. Motivation: Increasing Error Rates • IncreasingComponent Error Rates • Decreasingfeaturesizes (newtechgenerations) • Reducedvoltagesupply • Static (hard) vs. dynamic (soft) errors • 8% increaseerror rate per techgeneration[Borkar05] • 25,000 – 70,000 FIT / Mbit [Schroeder09] • Increasing System Error Rates • Increasingscale • # ofcomponents (core, transistor) • Memory capacities • Example: • Fixed error rate / component Cosmic Radiation(95% neutrons) Mem CPU 1 1 1 1 1 1 P( )=0.01 P( )=0.01 P( )=0.01 P( )=0.01 P( )=0.01 (at least onecomponentfails) P( )=0.039  Errors and error-prone behavior will become the normal case Resiliency-Aware Data Management

  3. Motivation: ResiliencyCosts • Implicit (silent) vs. Explicit (detected/corrected) Errors • State-of-the-art: errordetectionandcorrectionat HW/OS level • State-of-the-Art: Resilient Memory • ECC / paritybits / memoryscrubbing / fulldataredundancy • State-of-the-Art: Resilient Computing • Computationredundancy (8,4) (16,11) (32,26) • ECC Extended Hamming(7+1,4) (64,57) Task A Triple Modular Redundancy(TMR): Double Modular Redundancy(DMR): Task A Task A‘ voting =? Task A‘ Task A‘‘  Such resiliencymechanismscause „resiliencycosts“ Resiliency-Aware Data Management

  4. Motivation: ResiliencyCosts (2) • ResiliencyCostsCategories • Performance overhead (throughput, latency) • Memory overhead • Energyconsumption • Monetary HW costs • ResiliencyCosts@ OS-Level • Memory overhead(capacity, bandwidth) • Computationoverhead • Energyconsumption (increased time) • ResiliencyCosts@ HW-Level • Monetary HW costs(Chipset, ECC RAM) • Energyconsumption (time, chipspace) • Computationoverhead Data Management OS / Middleware OS / Middleware HW Infrastructure HW Infrastructure 0 1 2 3 CPU L3 ECC memcontrol Memory ECC RAM ECC RAM  Increasingerrorrates ~ increasingresiliencycosts! Resiliency-Aware Data Management

  5. Vision ofResiliency-Aware Data Management Resiliency-Aware Data Management

  6. Vision Overview nice-to-haveanalytics • Problem of State-of-the-Art • Resiliency-awareness on HW / OS level(general-purpose) • Increasingerrorrates • Increasingresiliencycosts • Key Observation • Different resiliencyrequirements • Data managementcontextknowledge • Resiliency-Aware Data Management • Exploitcontextknowledgeofqueryprocessinganddatastorage • Efficiency (reducedresiliencycosts) • Effectiveness(detection/correction) Qi Ui inputstreams mission- critical queries Data Management Data Management Data System Access System Storage System HW/OS primitives configuration OS / Middleware HW Infrastructure Resiliency-Aware Data Management

  7. Resilient Database Challenges C1: ResilientQuery Processing C3: Resiliency-Aware Optimization C2: ResilientData Storage Resiliency-Aware Data Management

  8. C1: Resilient Query Processing C1: QP C3: Opt • Challenge • Problem: missing/invalid tuples (explicit/implicit) • Goal: reliablequeryresultsbyerrorcorrection / error-tolerant algorithms • Example (AdvancedAnalytics) • Q: Ψk=365(γ( σa<107R⋈S⋈T⋈U )) • Computationredundancy C2: DS Plan Scheduling Operator Semantics Intermediate Results Guard Plan Ψk=365 Check γ γ ⋈ ⋈ ⋈ ⋈ ⋈ ⋈ σa<107 T S U σa<107 T S U R R Resiliency-Aware Data Management

  9. C1: Resilient Query Processing (2) C1: QP C3: Opt • Example (AdvancedAnalyticscont.) • AR(2), MSE, L-BFGS-B, C40 Energy Demand • P( )=0.01 • val∈ [0,max] • N=100 C2: DS Approximate Query Results Error-Tolerant Algorithms Error-Proportional Overhead Resiliency-Aware Data Management

  10. C2: Resilient Data Storage C1: QP C3: Opt • Challenge • Problem: dataloss/corruption (explicit/implicit) • Goal: datastabilitybydataredundancyanderrorcorrection • Example (Data Partitioning) • Table R (a,b,c) • Data redundancy(synopsisandreplicas) • Optimization • Exploitthe multiple replicas (complementary)layouts • E.g., different sortingorders, partitioningschemes, compressionschemes, etc C2: DS Synopsis SR Synopsis SR‘ Table R Table R‘ Test Scheduling Multiple Replicas WorkloadCharacteristics Time-based /on-the-flyerrordetectionandcorrection Resiliency-Aware Data Management

  11. C3: Resiliency-Aware Optimization C1: QP C3: Opt • Challenge • Problem: searchspaceof QP/DS, HW heterogeneity • Goal: Multi-objectiveoptimization (performance, accuracy, energy, resiliency) • Example (Frequency/VoltageScaling (DFS,DVS)) • 1) Choosefrequencylevel • 2) Select voltagescheme • 3) Optimizevoltage • E.g., decreasedfrequency/voltage C2: DS Q: Ψk=365 γ ⋈ ⋈ ⋈ DFS/DVS – – σa<107 T S U (+) – + + (–) Performance – R convex Errors Energy Multi-Objective, Global, Architecture-Aware Optimization + Accuracy Resiliency-Aware Data Management

  12. Conclusion • Problem of State-of-the-Art • General-purposeresiliencymechanismsat HW/OS level • Increasingerrorrates increasingresiliencycosts • Summary • Vision of „Resiliency-Aware Data Management“ • Challenge Resilient Query Processing • Challenge Resilient Data Storage • Challenge Resiliency-Aware Optimization • Research directionsandmore in thepaper! • Conclusion / New Opportunities • Resiliency-awaredatamanagementcanreduceresiliencycosts • Research Opportunity: • Reconsiderationofmany DB aspectsw.r.t. resiliency • ColloborationOpportunity: • Inter-disciplinaryresearchfield (HW, OS, Systems, DB) Resiliency-Aware Data Management

  13. ChooseyourResiliency Level! Resiliency-Aware Data Management

  14. Resiliency-Aware Data ManagementMatthias Boehm1 Wolfgang Lehner1 Christof Fetzer2TU Dresden 1 Database Technology Group2 Systems Engineering GroupAugust 30, 2011

  15. Background andRelated Work Resiliency-Aware Data Management

  16. Background andRelated Work • Taxonomy • Faults (techdefects), Errors (system-internal), Failures (system-external) • Staticvs Dynamic Errors (memory / computation) • Static (hard / permanent): cosmicradiation, dynamicvariability, aging • Dynamic (soft / transient): staticvariability, aging • Implicit vs. Explicit Errors • Implicit: silenterrors general-purposetechniques (ECC, etc) • Explicit: detectedorcorrectederrors • Related Work @ DB-Level • Error-awareframeworks (e.g., MapReduce/Hadoop)  general-purposetechniques • Recoveryprocessing / replication[Upadhyaya11]  reacting on explicit errors • Implicit: [Graefe09],[Borisov11], [Simitsis10]  specific DM aspects  Holisticresilientdatamanagement Resiliency-Aware Data Management

  17. ChooseyourResiliency Level! Resiliency-Aware Data Management

  18. TX Level vs. Resiliency Level • Similarities • Different application requirements on integrity • TX: physical and operational integrity • Resiliency: physical integrity • Ensuringintegrityincurrscostoverheads • Contextknowledgecanbeexploitedforreducingcosts • TX: TX scheduling (logicalserialization) • Resiliency: challengesandusecases • Differences • Configurationgranularity • TX: wecould handle different TX levelconcurrently • Resiliency: configuraing HW parameterscanhave global influence on multiple queries on that HW component • Scope • TX: integrityforrunningqueryor TX (assumption: DB istransformedfromoneconsistentstatetoanotherby TX only) • Resiliency: computationanddataintegrity Resiliency-Aware Data Management

More Related