190 likes | 326 Views
MARS: Adaptive Remote Execution Scheduler for Multithreaded Mobile Devices. Asaf Cidon *, Tomer M. London*, Sachin Katti , Christos Kozyrakis , Mendel Rosenblum. Stanford University. *Equal contributors. New Class of Mobile Applications. Computer Vision. Motion Sensing.
E N D
MARS: Adaptive Remote Execution Scheduler for Multithreaded Mobile Devices AsafCidon*, Tomer M. London*, SachinKatti, Christos Kozyrakis, Mendel Rosenblum Stanford University *Equal contributors
New Class of Mobile Applications Computer Vision Motion Sensing Augmented Reality
Mobile Client Trends • Mobile CPU performance increasing • Hitting ‘energy wall’ • Can we improve performance and reduce energy consumption? • Opportunity: network bandwidth increase utilize the cloud Maximum Bandwidth (Mb/s)
Static Client-Server PartitioningDoesn’t Work • Dynamic resources: • Network bandwidth and latency • Available CPU, memory • Same code, different platforms: • Smartphones (single-core, multi-core) • Tablets
MARS: Adaptive Remote Execution • Opportunistically offload computations to remote server • Enhance computational capabilities • Decrease energy consumption • Make dynamic decisions • Adapt to network and CPU variability Mobile Device Data Center
Agenda • Design of MARS • Simulator Results and Analysis • Conclusions
Existing Remote Execution Systems The Unit of Remote Execution Cloudlets [Satyanarayanan et al., ‘09] CloneCloud[Kirsch et al., ‘11] VM MAUI [Cuervo et al. ‘10] MARS “Cloud-on-Chip” Odessa [Ra et al. ‘11] RPC Chroma [Balan et al. ‘03] Target of Performance Optimization Single-thread application Multi-threaded application System
MARS “Cloud-on-Chip”: System Scheduling Previous Systems: Application Partitioning RPC Queue Local Execution Remote Execution Local Cores RPC 1 Process 1 RPC 1 Process 1 RPC 1 Process 2 RPC 2 Process 1 RPC 2 Process 1 RPC 1 Process 3 RPC 3 Process 1 RPC 4 Process 1 Remote Cores RPC 2 Process 3 RPC 5 Process 1
Greedy Algorithm EOR ≥ ? Higher POR: better performance gain from offloading Higher EOR: better energy saving from offloading EOR < ?
Controller Algorithm Remote Server Available RPC 3 (POR 2.5) Check EOR Threshold RPC 5 (POR 1.9) Priority Queue, sorted by Performance Offload Rank (POR) RPC 6 (POR 1.8) RPC 6 (POR 1.8) EOR Local Both Remote RPC 4 (POR 1.3) RPC 2 (POR 0.4) G (Greediness) trades-off utilization and energy efficiency Local Core Available
Agenda • Design of MARS • Simulator Results and Analysis • Conclusions
Remote Execution Applications Augmented Reality Face Recognition Pic Pic Pic Pic Pic Pic Barcode Detection Barcode Detection Barcode Detection Rendering Recognition Rendering Recognition Rendering Recognition
Simulator Methodology • Trace-driven simulation • Clients: • Nokia N900 (single core) • NVIDIA Tegra 250 (multicore) • Server: • Amazon EC2 Opteron 2007 • Networks: • Outdoors Wi-Fi • Indoors Wi-Fi • 3G
Nokia N900 Power Consumption • WiFi: Performance and energy are highly correlated • 3G: trade-off performance and energy
Agenda • Design of MARS • Simulator Results and Analysis • Conclusions
Conclusions • Can’t always be greedy • Performance and energy trade-off • MARS is optimized for multiple parallel applications and cores • MARS “Cloud-on-Chip”: validation of system-level remote execution scheduling • 57% performance increase, 33% energy savings