390 likes | 504 Views
Adaptive Video Coding to Reduce Energy on General Purpose Processors. Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/grace grace@cs.uiuc.edu. Introduction. Wireless multimedia increasingly common
E N D
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign http://www.cs.uiuc.edu/grace grace@cs.uiuc.edu
Introduction • Wireless multimedia increasingly common • Recent advances reduce constraints: • 2GHz+ processors • High-speed wireless networks • Systems now Energy limited • Energy management essential
Adaptation • Adaptation key to energy management • Hardware adaptation already common • Software adaptation also possible • Challenges • How do we control adaptations? • How do we coordinate different adaptations?
GRACE Project • Target mobile multimedia devices. • Coordinated adaptation of all system layers • Hardware, application, network, OS • Complete cross-layer adaptation framework • Preserves separation between layers
Goals of this work • Target wireless video transmission • Adapt application: Adaptive video encoder • Adapt hardware: Adaptive CPU • Implement part of GRACE framework • Trade off between CPU and network energy
Contributions • Apply existing adaptive-CPU research • Energy-adaptive video encoder • Trades off between network, CPU • Allows adaptation with fixed QoS • Cross-layer adaptation framework • Coordinate app and CPU adaptation • Preserves logical separation between layers • 20% Energy savings over existing systems
Presentation Overview • System model • System architecture and design • Cross-layer adaptation process • Results
System Model Adaptive CPU • Total Energy = CPU Energy + Network Energy Adaptive Video Encoder Wireless Network Video Capture Control
CPU Hardware Adaptation [Micro] • Reduce performance to save energy • Voltage and frequency scaling • Lower freq lower voltage lower energy • Architecture adaptation • Issue width • Active functional units (ALUs, etc.) • Instruction window size
Adaptive Encoder • Based on TMN H.263 encoder • Changed to logarithmic motion search • Encoder adapts for energy • Trade off between network and CPU energy • More computation fewer bits • Adapt Motion Search and DCT • Computationally expensive • Elimination affects primarily rate
Adaptive Encoder Details • Motion Search and DCT thresholds • Terminate MS early when SAD under threshold • Skip DCT if SAD of block under threshold • Transmit “DCT flag” bit for each 8x8 block • Extends H.263 standard • Adaptation effect: • Setting thresholds at infinity • Reduces CPU load by ~50% • Increases data rate by 2x or more
Adaptation Control • When do we adapt? • What configurations do we choose?
Adaptation Control • When do we adapt? • Adapt before every frame • What configurations do we choose?
Adaptation Control • When do we adapt? • Adapt before every frame • What configurations do we choose? • Must minimize total CPU+network energy • Must complete frame within its allocated time
Adaptation Control • When do we adapt? • Adapt before every frame • What configurations do we choose? • Must minimize total CPU+network energy • Must complete frame within its allocated time • How do we find the optimal configurations?
Optimization • Application, CPU reconfiguration linked • Application reconfiguration changes workload • CPU reconfiguration changes performance • App config affects optimal CPU configuration … and vice versa • Two stage approach 1. For each app config, find CPU config, energy 2. Pick lowest-energy application configuration
Optimization Algorithm 1. For each app config, find • Best CPU config • CPU energy • Network energy • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy
Optimization Algorithm 1. For each app config, find • Best CPU config • completes in time, with least energy [MICRO’01] • CPU energy • Network energy • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy
Optimization Algorithm 1. For each app config, find • Best CPU config • completes in time, with least energy [MICRO’01] • CPU energy • Network energy • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count
Optimization Algorithm 1. For each app config, find • Best CPU config • completes in time, with least energy [MICRO’01] • CPU energy = Instruction count x Energy per instruction [MICRO’01] • Network energy • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count
Optimization Algorithm 1. For each app config, find • Best CPU config • completes in time, with least energy [MICRO’01] • CPU energy = Instruction count x Energy per instruction [MICRO’01] • Network energy = Byte count x Energy per byte [WaveLAN measured] • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count
Optimization Algorithm 1. For each app config, find • Best CPU config • completes in time, with least energy [MICRO’01] • CPU energy = Instruction count x Energy per instruction [MICRO’01] • Network energy = Byte count x Energy per byte [WaveLAN measured] • Total energy = CPU energy + network energy 2. Pick app config with lowest total energy Requires instruction count Requires byte count
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . App configuration energy table
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 CPU Optimizer Find CPU Configuration Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . App configuration energy table
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . App configuration energy table
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator + Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . App configuration energy table
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator + Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . .
Adaptation Process: Stage 1 CPU Net Predict Next Instr. Count Predict Next Byte. Count App. Conf. 1 CPU Optimizer Find CPU Configuration CPU Energy Estimator Predict CPU Energy Predict Net Energy Network Energy Estimator + Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . .
Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . .
Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . Pick Lowest Energy
Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . Pick Lowest Energy Chosen Configuration CPU Adaptor Application Adaptor
Adaptation Process: Stage 2 Conf 1 Energy Conf 2 Energy Conf 3 Energy Conf n Energy . . . Pick Lowest Energy Chosen Configuration CPU Adaptor Application Adaptor Capture, Encode, and Transmit Frame
Predictors • How do we predict instructions and bytes? • Fixed software use previous frame data • Adaptive software no longer works! • Solution: Offline profiling • Encode reference sequences offline • Transition randomly between app. configs • Fit predictors to transitions between configs • Map last instruction, bytes to new app. config • Linear, 1st-order predictors
Experiments • RSIM CPU simulator • State-of-the-art CPU, memory • Princeton Wattch energy model • Reported energy typical of modern CPUs • Simulation Conditions: • Fixed and adaptive CPU • Fixed and adaptive software • Foreman sequence
Fixed vs Adaptive Systems 35 • Adaptive hardware saves 70% over fixed system • Adaptive application saves • 30% on fixed hardware • 20% on adaptive hardware (total savings of 80%) 30.49 CPU 30 Fixed System Net 25 21.23 Energy (J) Adaptive S/W 20 Adaptive H/W 15 10 7.36 Adaptive Sys 6.25 5 0
Algorithm Comparison • Baseline: Fixed software, adaptive hardware • Adaptive software: • Adaptive DCT/motion thresholds • Instruction, byte count for next frame predicted • Oracle • Instruction and byte count for next frame exact • Adapt-Once • Adapt once at start of encoding • Minimize total energy across entire sequence
Algorithm Comparison 8 7.36 CPU 6.55 6.25 6.09 Fixed 6 Net Energy (J) Adapt Once 4 Adaptive 2 Oracle 0 • Energy consumption of Adaptive within 3% of Oracle • Simple predictors sufficient for energy savings • Adaptive saves 5% over Adapt-Once • Frame-by-frame adaptation can save energy
Other test cases • Low Power CPU • Network energy dominated • Software adaptation did not save energy • Carphone • Little inter-frame variation • One-shot adaptation was sufficient • Adapt-Once, Adaptive, Oracle same energy • Adaptive software saved ~15%
Conclusions • A new framework for coordinated CPU/application adaptation • Combined benefits of both adaptations • Preserves separation between layers • Adaptive applications save energy: • Up to 20% on adaptive hardware • Up to 30% on fixed hardware