810 likes | 1k Views
Cooperative cross-layer protection for resource constrained Mobile Multimedia systems. Prof. Nikil Dutt Prof. Nalini Venkatasubramanian Prof. Lichun Bao. Kyoungwoo Lee (final defense). Nov. 26, 2008. Contents. Thesis Motivation Thesis Proposal – Cooperative, Cross-layer Methods
E N D
Cooperative cross-layer protection for resource constrained Mobile Multimedia systems Prof. Nikil Dutt Prof. Nalini Venkatasubramanian Prof. Lichun Bao Kyoungwoo Lee (final defense) Nov. 26, 2008
Contents • Thesis Motivation • Thesis Proposal – Cooperative, Cross-layer Methods • PPC (Partially Protected Caches) • EAVE (Error-Aware Video Encoding) • CC-PROTECT (Cooperative, Cross-layer Protection) • Thesis Contribution and Future Direction
Mobile Multimedia Embedded Systems Resource-limited mobile devices! Main problem is to achieve low power with high performance, high QoS, and high reliability Map Routing 3D Graphics Image Browsing Animation Mobile TV Web Browsing Video Streaming Satellite TV Video Conferencing
Reliability • Reliability is an emerging and critical concern in mobile devices • New enhanced technology makes devices vulnerable to errors due to high complexity and high integration • Exponential increase of soft error rate as technology scales [Baumann, 05] • Mobile applications are running close to humans • In pervasive computing, failures of healthcare mobile devices cause serious results • Redundancy techniques incur high overheads of power and performance • TMR (Triple Modular Redundancy) may exceed 200% overheads without optimization [Nieuwland, 06] • Challenging to optimize multiple properties (e.g., performance, power, QoS, and reliability) in mobile embedded systems
Soft error is becoming an every second concern! • Soft Error Rate (SER) – FIT (Failures in Time) = number of errors in 109 hours
Errors and Failures in Mobile Embedded Systems Application Middleware/ OS Hardware Network • Faults or Errors can cause Failures Bug Packet Loss Exception Soft Error
Errors and Error Control Schemes at Hardware Hardware Application Network MW/ OS • Hardware failures are increasing as technology scales • (e.g.) SER increases by up to 1000 times [Mastipuram, 04] • Redundancy techniques are expensive • (e.g.) ECC-based protection in caches can incur 95% performance penalty [Li, 05] • FIT: Failures in Time (109 hours) • MTTF: Mean Time To Failure • MTBF: Mean Time b/w Failures • TMR: Triple Modular Redundancy • EDC: Error Detection Codes • ECC: Error Correction Codes • RAID: Redundant Array of • Inexpensive Drives
Errors and Error Control Schemes at Software Hardware Application Network MW/ OS • Software errors become dominant as system’s complexity increases • (e.g.) Several bugs per kilo lines • Hard to debug, and redundancy techniques are expensive • (e.g.) Backward recovery with checkpoints is inappropriate for real-time applications • QoS: Quality of Service
Errors and Error Control Schemes in Networks Hardware Application Network MW/ OS • Network is unreliable (especially, wireless networks) • Joint approaches across OSI layers have been investigated for minimal costs [Vuran, 06][Schaar, 07] • SNR: Signal to Noise Ratio • MTTR: Mean Time To Recovery • CRC: Cyclic Redundancy Check • MIMO: Multiple-In Multiple-Out
Conventional Approaches • Most redundancy techniques incur overheads in terms of performance, power, area, etc. • Conventional TRM (Triple Modular Redundancy) can incur 200% overheads without optimization. • Backward Recovery with Checkpoints cannot guarantee the completion time of a task. • Recently proposed techniques have focused on the cost reduction without losing reliability • However, they still incur overheads
Thesis Problem Statement • Study tradeoffs among system properties • (e.g.) Redundancy incurs energy overheads while DVS increases SER significantly • Examine errors and error control schemes across system abstraction layers • (e.g.) network errors & error-resilient video encoding, soft errors & ECC or EDC, etc. • Maximize reliability with minimal costs of power and performance for mobile embedded systems • Mainly focus on soft error reduction for mobile multimedia embedded systems
Cross-Layer Methods • Cross-layer approaches: • aim at system-level optimization • Integrate and coordinate techniques across system layers • Classification [Srivastava, 05] • Top-down, Bottom-up, or Both direction • Top-down – PPC, PDVS (GRACE), etc. • Bottom-up – EAVE, etc. • Both direction – CC-PROTECT, etc. • Coupling or Merging layers • Dynamo [Mohapatra], xTune [Kim], etc. Merging Bottom-up Top-down Coupling • PDVS – Practical Dynamic Voltage Scaling
Cross-Layer Approaches – GRACE Application Operating System Hardware • GRACE project @ UIUC [W. Yuan Ph.D. thesis in ’04 and A. F. Harris III, Ph.D. thesis in ’06] • QoS/Power tradeoffs • Primarily OS adaptation for power management in multimedia mobile devices • Network adaptation for power management in multimedia communications [GRACE, 05]
Cross-Layer Approaches – DYNAMO & FORGE Application Middleware/ OS Hardware Proxy Server (NW & MW) • DYNAMO middleware for FORGE project @ UCI [S. Mohapatra Ph.D. thesis in ’05 and R. Cornea Ph.D. thesis in ’07] • QoS/Performance/Power tradeoffs for mobile embedded systems • Middleware-driven coordination and proxy-based cooperation • Content transcoding at the application layer • Network traffic shaping at the network layer • Backlight (LCD display) setting at the hardware layer • NIC shutdown, CPU DVS/DFS at the hardware layer 1 2 3 4
Handheld Server Cross-Layer Approaches – xTune Formal Method Proxy Server System Realization Application Middleware/ OS Hardware • xTune framework @ UCI and SRI [M. Kim Ph.D. thesis in ’08] • QoS/Power/Timeliness adaptation for distributed real-time embedded systems • A Formal Methodology for cross-layer tuning and verifiable timeliness of Mobile Embedded Systems
Thesis Proposed Contribution • Thesis proposes a cross-layer design methodology for mobile multimedia embedded systems with minimal costs • Reliability/QoS/Power/Performance system optimization for mobile multimedia systems • Cooperative, Cross-Layer Protection • PPC, EAVE, & CC-PROTECT • Low-cost reliability
Overview of Thesis Proposals Multimedia Application Error-Resilient Encoder (e.g., PBPAIR) Packet Loss Frame Drop Error- Aware Video EAVE Application • PPC (Partially Protected Caches) • EAVE (Error-Aware Video Encoding) • CC-PROTECT (Cooperative, Cross-layer Protection) Mobile Video Application QoS Original Video Error-Controller (e.g., frame drop) Monitor & Translate SER Mobile Video Application MW/OS Error Injection Rate Correction & Frame Loss Rate Unprotected Cache Error detection EDC Protected Cache ECC Hardware Error-prone Networks Error-prone Networks
Contents Application Hardware Middleware/ OS Network • Thesis Motivation • Thesis Proposal – Cooperative, Cross-layer Methods • PPC (Partially Protected Caches) • EAVE • CC-PROTECT • Thesis Contribution and Future Direction
Conventional Protection for Caches • Cache is the most hit by soft errors • Conventional Protected Caches • Unaware of fault tolerance at applications • Implement a redundancy technique such as ECC to protect all data for every access • Overkill for multimedia applications • ECC (e.g., a Hamming Code) incurs high performance penalty by up to 95%, power overhead by up to 22%, and area cost by up to 25% Unaware of Application High Cost Cache ECC
PPC (Partially Protected Caches) • Observation • Not all data are equally failure critical • Multimedia data vs. control variables • Propose PPC architectures to provide an unequal protection for mobile multimedia systems [Lee, CASES06][Lee, TVLSI08] • Unprotected cache and Protected cache at the same level of memory hierarchy • Protected cache is typically smaller to keep power and delay the same as or less than those of Unprotected cache PPC Unprotected Cache Protected Cache How to Partition Data? Memory
PPC Unprotected Cache Protected Cache Memory PPC for Multimedia Applications • Propose a selective data protection [Lee, CASES06] • Unequal protection at hardware layer exploiting error-tolerance of multimedia data at application layer • Simple data partitioning for multimedia applications • Multimedia data is failure non-critical • All other data is failure critical Power/Delay Reduction Fault Tolerance
PPC Unprotected Cache Protected Cache Memory PPC for General Applications • DPExplore [Lee, PPCDIPES08] • Explore partitioning space by exploiting vulnerability of each data page • Vulnerable time • It is vulnerable for the time when eventually it is read by CPU or written back to Memory • Pages causing high vulnerable timeare failure critical • Vulnerable time closely estimates failure rate • Reduce the number of simulations to estimate the failure rate invulnerable Incoming Eviction data Read Write t0 t1 t2 t3 Vulnerable
Summary – PPC • All data are not equally failure critical • Propose a PPC architecture to provide unequal protection • Support an unequal protection at hardware layer by exploiting error-tolerance and vulnerability at application • Present cost-efficient reliability • Related Publications • [Lee, CASES06] – PPC for multimedia embedded systems • [Lee, PPCDIPES08] – PPC for general applications • [Lee, TVLSI08] – PPC and design space exploration • Under submission • [Lee, TODAES??] – Partitioning techniques for general applications and instruction caches Application Data & Code Error-tolerance of MM data Vulnerability of Data & Code Page Partitioning Algorithms Failure Non-Critical Failure Critical FNC & FC are mapped into Unprotected & Protected Caches Unprotected Cache Protected Cache PPC
Contents Application Middleware/ OS Hardware Network • Thesis Motivation • Thesis Proposal – Cooperative, Cross-layer Methods • PPC • EAVE (Error-Aware Video Encoding) • CCPROTECT • Thesis Contribution and Future Direction
Active Error Exploitation – Intentional Frame Drop • Intentional Frame Drop (one way to actively exploit errors) can result in energy reduction for each operation • FDT-1 affects the following components with respect to power, performance, and QoS in mobile video applications Mobile Video Application Enc Tx Rx Dec CPU WNI WNI CPU FDT-1 FDT-2 FDT-3 Packet Loss • FDT: Frame Drop Type • Enc: Encoding, Dec: Decoding • WNI: Wireless Network Interface Error-prone Networks
Error-Aware Video Encoding • Propose EE-PBPAIR [Lee, DIPES08] • Intentionally drop frames at video encoding • Reduce the energy consumption for video encoding • Maintain the video quality by exploiting error-resilience of PBPAIR Intentional frame drop Packet Loss Error-Aware Video Encoder (EAVE) Error- Resilient Video Error- Aware Video Original Video Error-Controller (e.g., frame dropping) Error-Resilient Encoder (e.g., PBPAIR) EIR • EIR: Error Injection Rate Error-prone Networks
Summary – EAVE • Intentional Frame Drop is one way to exploit errors actively • Propose an error-aware video encoding (EE-PBPAIR) • Present a knob (EIR) to adjust the amount of errors considering the QoS feedback • Maintain the video quality using error-resilience of PBPAIR • Related Publication • [Lee, DIPES08] – EE-PBPAIR • Considering Submission • [Lee, TECS??] – Generalized idea for error-resilient video encodings Error Resilient Video Encoder Error-Aware Video Data Application Error Rate = PLR + EIR Error Controller Network or Decoding Side EIR PLR & QoS Middleware Energy Reduction CPU, Memory, and WNI Hardware • EIR: Error Injection Rate • PLR: Packet Loss Rate
Contents Application Hardware Middleware/ OS Network • Thesis Motivation • Thesis Proposal – Cooperative, Cross-layer Methods • PPC • EAVE • CC-PROTECT (Cooperative Cross-layer Protection) • Thesis Contribution and Future Direction
Errors and Error Control Schemes – No Coupling Application Middleware/ OS Network Hardware • Different errors and their protection techniques have not been considered jointly • No coupling and no cooperation • Cooperating control schemes in a cross-layer manner can open a new venue Mobile Video Application Packet Loss Soft Error Error-prone Networks
PPC still incurs overheads due to ECC-protection • Propose PPC architectures to provide an unequal protection for mobile multimedia systems [Lee, TVLSI08] • Unprotected cache and Protected cache a the same level of memory hierarchy • PPC still incurs overheads due to high expensive ECC-protection at the protected cache • 29% energy reduction compared to the protected cache • 10% energy overhead compared to the unprotected cache PPC Unprotected Cache Protected Cache Memory
network PBPAIR is energy-inefficient in error-free network • PBPAIR is error-resilient and energy-efficient in general • PBPAIR may not be energy efficient in case of error-free network Packet Loss PLR PBPAIR Intra_Threshold • PBPAIR: Probability-Based Power Aware Intra Refresh [Kim, 06]
Outline of CC-PROTECT Original Video Error-Controller (e.g., frame drop) Error-Resilient Encoder (e.g., PBPAIR) Error- Aware Video Error-Aware Video Encoder (EAVE) Mobile Video Application Error Injection Rate & Frame Loss Rate QoS Loss BER (Backward Error Recovery) DFR (Drop & Forward Recovery) Monitor & Translate SER Trigger Selective DFR Support EAVE & PPC Packet Loss Frame Drop MW/OS Soft Error Mobile Video Application Feedback SER Data Mapping frame K frame K+1 Parameter Unprotected Cache Protected Cache EDC Error detection PPC Error-prone Networks Error-prone Networks
Energy Saving EDC + DFR impact 36% Reduction compared to HW-PROTECT 26% Reduction compared to BASE EDC + DFR + PBPAIR(CC-PROTECT) impact 56% Reduction compared to HW-PROTECT 49% Reduction compared to BASE EDC impact 17% Reduction compared to HW-PROTECT 4% Reduction compared to BASE Application (Error-Prone or Error-Resilient) Hardware (Unprotected or Protected) • BASE = Error-prone video encoding + unprotected cache • HW-PROTECT = Error-prone video encoding + PPC with ECC • APP-PROTECT = Error-resilient video encoding + unprotected cache • MULTI-PROTECT = Error-resilient video encoding + PPC with ECC • CC-PROTECT1 = Error-prone video encoding + PPC with EDC • CC-PROTECT2 = Error-prone video encoding + PPC with EDC + DFR • CC-PROTECT = error-resilient video encoding + PPC with EDC + DFR
Summary – CC-PROTECT Application Middleware/ OS Hardware • Propose CC-PROTECT approach, which cooperates existing schemes across layers to mitigate the impact of soft errors on the failure rate and video quality in mobile video encoding systems • PPC (Partially Protected Caches) with EDC (Error Detection Codes) at hardware layer • DFR (Drop and Forward Recovery) at middleware • PBPAIR (Probability-Based Power Aware Intra Refresh) at application layer • Demonstrate the effectiveness of low-cost (about 50%) reliability (1,000x) at the minimal cost of QoS (less than 1%) • Related Publication • [Lee, ACMMM08] – CC-PROTECT • Considering Submission • [Lee, ACMTOMCCAP??] – Tradeoff space exploration with CC-PROTECT PBPAIR - Error Resilience DFR - Error Correction ECC EDC Unprotected Cache Protected Cache
Contents Application Hardware Middleware/ OS Network • Thesis Motivation • Thesis Proposal – Cooperative, Cross-layer Methods • PPC • EAVE • CC-PROTECT • Thesis Contribution and Future Direction
Overall Thesis Contribution Application Middleware/ OS Hardware Network • Cross-layer methodology to design mobile multimedia embedded systems with minimal costs • Effective Cross-layer approaches for reliability • Low-cost reliability • Expanded trade-off space • Extended applicability of existing techniques Packet Loss Frame Drop Soft Error
Effectiveness of Thesis Proposals (Energy Saving) PPC EAVE CCPROTECT • 29% energy reduction, as compared to a conventional protected cache with ECC • 37% energy reduction, as compared to a conventional video encoding • 56% energy reduction, as compared to a conventional composition of protections
Publication Application Middleware/ OS Hardware Network [Lee, ACMMM08] K. Lee, A. Shirvastava, M. Kim, N. Dutt, and N. Venkatasubramanian, “Mitigating the impact of hardware defects on multimedia applications – A cross-layer approach”, In ACM International Conference on Multimedia, Oct. 2008. [Lee, TVLSI08] K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian, “Partially protected caches to reduce failures due to soft errors in multimedia applications”, In IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 2008, to appear. [Lee, DIPES08] K. Lee, M. Kim, N. Dutt, and N. Venkatasubramanian, “Error exploiting video encoder to extend energy/QoS tradeoffs for mobile embedded systems”, In 6th IFIP Working Conference on Distributed and Parallel Embedded Systems (DIPES), Sep. 2008. [Lee, PPCDIPES08] K. Lee, A. Shrivastava, N. Dutt, and N. Venkatasubramanian, “Data partitioning techniques for partially protected caches to reduce soft error induced failures”, In 6th IFIP Working Conference on Distributed and Parallel Embedded Systems (DIPES), Sep. 2008. [Lee, CASES06] K. Lee, A. Shrivastava, I. Issenin, N. Dutt, and N. Venkatasubramanian, “Mitigating soft error failures for multimedia applications by selective data protection”, In Int.Conference on Compilers, Architecture, & Synthesis for Embedded Systems (CASES), Oct. 2006. [Lee, ICME05] K. Lee, N. Dutt, and N. Venkatasubramanian, “Experimental Study on Energy Consumption of Video Encryption for Mobile Handheld Devices", In IEEE International Conference on Multimedia and Expo (ICME 05), Poster Session, July 2005. [Mohapatra, IPDPS05] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, and N. Venkatasubramanian, “A cross-layer approach for power-performance optimization in distributed mobile systems”, In Next Generation Software Program in conjunction with IEEE International Parallel and Distributed Processing Symposium (IPDPS), April 2005. [Lee, DIPES08] [Lee, TVLSI08] [Lee, PPCDIPES08] [Lee, CASES06] [Lee, ACMMM08] [Mohapatra, IPDPS05] [Lee, ICME05]
Future Direction Application Middleware/ OS Hardware Network • Error Rate Translation/Integration • Different types of errors • Different components across system layers • Cross-layer methods for distributed embedded systems (Horizontal Expansion) • Network-aware methods • Context-aware approaches Mobile Video Application Bug Packet Loss Exception Soft Error Error-prone Networks
Why Cross-Layer Approach? • Cross-layer interactions and conflicts arise between system properties • DVS increases SER exponentially • Over protection or under protection • All ECC for multimedia data is an overkill • Cross-layer approaches can maximize the reliability with minimal power and performance overheads • Benefits of Cross-layer approaches • Global system view • Coordination for intelligent selection • Adaptation • Cross-layer approaches have been promising to save the resources at the cost of QoS [Mohapatra, 05][Yuan, 04] • DVS: Dynamic Voltage Scaling • SER: Soft Error Rate • ECC: Error Correction Codes • QoS: Quality of Service
Thesis Proposed Contribution: CC-PROTECT • Cooperative Cross-layer Protection (CC-PROTECT) by exploiting error-awareness and error control schemes across system abstraction layers • Contribution • Present cost-efficient reliability methods (cooperative cross-layer protection) • Open expanded tradeoff spaces and operating points • Rediscover applicability of existing approaches for other purposes
Performance vs. Capacity • Total energy available from a battery is a design issue and is fixed at a design time, along with its weight and size • Stark contrast between linear growth rate of battery capacity and exponential technology improvement rate of system components [Udani] Sanjay Udani and Jonathan Smith, “Power management in mobile computing”
Generalized Fault Tolerance Techniques • Modular Redundancy • N-Version Programming • Error-Control Coding • Checkpoints and Rollbacks • Recovery Blocks [Chetan, SPC04] S. Chetan, A. Ranganathan, and R. Campbell, “Towards Fault Tolerant Pervasive Computing”, in SPC ’04 [Somani, IEEECom97] A. K. Somani and N. H. Vaidya, “Understanding Fault Tolerance and Reliability”, in IEEE Computer ’97 vol. 30 issue 4
1) Modular Redundancy • Modular Redundancy • Multiple identical replicas of hardware modules • Voter mechanism • Compare outputs and select the correct output Tolerate most hardware faults Effective but expensive fault Data Producer A Consumer voter Producer B
2) N-version Programming • N-version Programming • Differentversions by different teams • Different versions may not contain the same bugs • Voter mechanism Tolerate some software bugs Data Producer A Consumer voter Program i fault Program j Programmer K Programmer L
3) Error-Control Coding • Error-Control Coding • Replication is effective but expensive • Error-Detection Coding and Error-Correction Coding • (example) Parity Bit, Hamming Code, CRC Much less redundancy than replication fault Data Producer A Consumer Error Control Data
4) Checkpoints & Rollbacks • Checkpoints and Rollbacks • Checkpoint • A copy of an application’s state • Save it in storage immune to the failures • Rollback • Restart the execution from a previously saved checkpoint Recover from transient and permanent hardware and software failures Data Producer A Consumer Application State K Rollback state (K-1) state K fault Checkpoint
5) Recovery Blocks • Recovery Blocks • Multiple alternates to perform the same functionality • One Primary module and Secondary modules • Different approaches • Select a module with output satisfying acceptance test • Recovery Blocks and Rollbacks • Restart the execution from a previously saved checkpoint with secondary module Tolerate software failures Data Producer A Consumer Application Block X Block X2 Block Y Block Z Rollback state (K-1) state K fault Checkpoint