250 likes | 381 Views
Reduced Energy Decoding of MPEG Streams. Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs. System Environment. Portable client – limited battery life Multimedia server – ample compute/storage Application – stored media streaming with MPEG decoding performed by the client.
E N D
Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs Multimedia Computing and Networking Jan 2002
System Environment • Portable client – limited battery life • Multimedia server – ample compute/storage • Application – stored media streaming with MPEG decoding performed by the client
Problem • Tradeoff – client energy consumption increases with media stream quality • User should be able to choose the operating point to balance quality and battery life • Goal: improve the energy/QoS tradeoff by reducing the energy consumption required for each level of media quality
Approach • Idea: exploit the ample resources of servers to improve client battery life • Client supports a discrete set of voltages and clock frequencies • voltage speed, energy consumption • Dynamic Voltage Scaling – DVS • Server pre-processes (offline) stored media • Computes frame decoding order • Assigns voltage/frequency per frame • Transmits schedule to client for DVS execution
Contributions • New DVS scheduling algorithm • Minimizes CPU energy consumption • Satisfies timing constraints • Satisfies buffering constraints • Quantification of the energy-QoS tradeoff • Evaluation of the impact of DVS and client design parameters (processor speed, buffering) on the energy-QoS tradeoff
Audio Display Buffer Decoding order: I0 P1 B2 B3 P2 ... decoder Video Display Buffer Input fifo Past I0 B3 Future P1 Reference Buffers Decoding Hardware Organization
Naïve Scheduling is Bad Audio Video Voltage Voltage Naive scheduling = EDF task order + greedy voltage assignment.
DVS Scheduling Algorithm • Goal : minimize energy consumption • For a uni-processor client find voltage-frequency settings per frame and interleaved order of decoding frames • Subject to the following constraints • Frames within a stream are in a fixed decoding order • Frame decode interdependence (I-, P-, B-frames) • Display rates for video (33 fps) and audio (44 Khz) • Audio/Video synchronization: 80 ms • Limited client display buffer capacity
DVS Scheduling Algorithm (continued) • Approach: dynamic programming • Find the energy optimal subschedule that completes the first i video and j audio tasks by time t, over search space (i,j,t). Report the best results over all possible t for the full media. • Search space is reduced by exploiting our knowledge of the constraints
Main Challenges • Frame decoding inter-dependencies: B-frames depend on future P-frames • Decoding order not equal to display order • Construct a mapping function from frame decode number to frame display number in order to compute correct deadlines • Limited buffer capacity • Algorithm must have overflow avoidance mechanism • Multiple voltage levels and possible frame decoding orders • Intractable search space, pruning necessary
Fixed Display Buffer Capacity • Overflow prevention: Translate buffer constraints to timing constraints • Assign minimum decoding start times to tasks Suppose display buffer is full (contains previously decoded frames) Earliest time to enqueue (min start time) for B5 is when head frame I0 leaves buffer to be displayed The head frame I0 is identified using the frame display order and buffer capacity
Key to Tractable Execution • Limit the number of combinations of (i, j, t) • Limit the range of subschedule completion times t (time windows) • Limit the combinations of (i,j) by detecting “dead-end” subschedules small number of (i,j) pairs, each with small time window
Limiting Completion Times: Time Window • A window represents possible completion times of i video and j audio tasks. • Lower Bound, Tmin(i,j): earliest time when the last task in both streams can complete • Upper Bound, Tmax(i,j): latest time when the last task in both streams can complete • Tmax – Tmin ~ (1/frame_rate) * buffersize
(i + 1)-th video frame i-th video frame tmin[i+1,j] tmax[i,j] tmax[i+1,j] tmin[i,j] Time Window Example Time
Video frame in display Video Audio 11 10 12 Audio frame in display 13 14 10 Only some (i,j) subschedules lead to complete schedule N = #frames B = buffer size Ts = 1/frame rate • Scheduling (i,j) = (10,14) POSSIBLE BUT • Scheduling (i,j) = (10,15) is NOT POSSIBLE because AUDIO BUFFER OVERFLOWS! • (i,j) is limited by the buffer size Algorithm Complexity: O(N *B) O(B* TS) = O(N * B 2 * Ts)
Performance Evaluation:Energy vs QoS Exploration • Variability in frame execution times • Potential for energy reduction? • Energy savings vs picture quality • For what range of quality is DVS helpful ? • How much improvement is in that range ? • Impact of client design parameters on energy vs QoS • How does processor speed change tradeoff ? • Will extra buffering ease schedulability? Reduce energy?
Energy-QoS tradeoff: Fast Processor + Fixed Buffer Size Pentium 3 (1.9V@500MHz, 1.4V@316 MHz) Display buffers: 2 for video, 2 for audio Scale factor = frame pixels/max frame pixels 17000 16000 50 47% 15000 14000 dvs hi volt 13000 40 lo volt 12000 11000 10000 30 9000 8000 7000 20 19% 6000 5000 4000 10 3000 2000 1000 0 0 0.7 0.75 0.8 0.85 0.9 0.95 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 scale factor scale factor • Energy improvement over range of high resolution scale factors, 0.6 to 1
Energy-QoS tradeoff: Slow Processor + Variable Buffer Size • Pentium 2: (1.7V@300MHz, 1.4V@225MHz) • Variable buffering : (video,audio) (1,1), (3,3) (6,3) • Increasing buffering does not improve energy significantly • Extra buffers enable decoding of higher QoS video
Summary and Conclusions • Offline algorithm finds a low energy schedule that respects: • Timing constraints (display rate, synchronization) • Limited memory at client • DVS significantly reduces energy consumption • Increasing buffer size • No impact on energy but • Enables higher video quality
Future Work • Online scheduling • Offline schedule represents lower bound on energy • Exploration of other tradeoff media parameters (frame rate, display brightness) • Implementation with progressive coding schemes (JPEG2000)
Experimental Setup • Fixed voltage/frequency processors: P3 and P2 • Computed time/energy per frame at fixed voltage • Extrapolated time/energy per frame at other operational core voltages • Assumptions: • Frequency is inverse proportional to gate delay • Cycles/frame remains constant for different frequencies • Power dissipation constant for a given voltage setting
Extrapolation Example • Given: Vhi, Fhi, , Thi, Phi • Flo = Fhi * hi/lo = Fhi * Vhi/(Vhi-Vt)2 (1) • Tlo = cycles/Flo = Fhi * Thi/Flo (2) • Plo = Phi * (Flo* Vlo2)/(Fhi * Vhi2) • Elo = Plo * Tlo (3)
Related Work • Problem we address • Real-time scheduling of non-preemptable tasks with precedence constraints • Other real-time schedulers treat different cases • [1] Liu and Layland, “Scheduling algorithms for multiprogramming in a hard-real-time environment” • [2] Yao et al. “A scheduling model for reduced CPU energy” • No precedence constraints and preemptable tasks • [3] Hong et al. “Power optimization of variable voltage core-based systems” • Heuristics for non-preemptable tasks but no precedence constraints
Frame Interdependence • Map frame number i in decoding order to frame number d(i) in display order