160 likes | 290 Views
Job Scheduling on Amazon EC2 . Nathaniel Hart 5/19/14. What is Amazon EC2?. Instant, configurable server instances Pay only for what you use Easy to scale Frustrating Instances come bare-bones User configured MPI can run on it
E N D
Job Scheduling on Amazon EC2 Nathaniel Hart 5/19/14
What is Amazon EC2? • Instant, configurable server instances • Pay only for what you use • Easy to scale • Frustrating • Instances come bare-bones • User configured • MPI can run on it • But the latency in a shared system can kill it, unless you pay extra for a cluster that is in the same rack.
What is Job Management? • System level • Coordinating local and cloud resources • Cluster level • Dispatching jobs to all available servers in an equal manner • Job migration • Server level • Scheduling jobs to run efficiently
System Level Job Management client client client Amazon EC2 corporate data center
Cloudbursting using CometCloud Diagram Source: Hyunjoo, Kim et al
Cost Savings Using Cloudbursting Source: Li, Yin et al.
Job Dispatching vs. Job Scheduling Cluster Level Server Level Assigns tasks to computing resources within an instance Focused on using instance resources most efficiently Should be able to execute in relative isolation (worker node) • Load balance between available EC2 instances • Focused on maximizing use of all instances • Requires system-wide awareness (master node)
Cluster Level Job Management decider client client client cluster instance instance instance instance cluster detail
Job Migration • Load balancing: • If a VM is starved, send it a job. • Incurs a time penalty, and can get out of hand quickly if not managed. • Requires Management • Limit job migration to when jobs cannot be scheduled normally • Closer to end of a job, some instances may be starved of tasks. • Limit job migration to a fixed interval
Scheduling Comparison Source: Li, Yin et al.
Closing Thoughts • This was a summary of the methods I found. There appear to be many solutions, and each author claims that their own works wonderfully. • The only constants are the problems: • Volatile job execution time / resource requirements • Emergent properties and unknowns at start of job can drastically affect the job scheduling needs • Tradeoffs between computation and communication • Need for reliability • Need for cost efficiency
Works Sourced • Amazon Web Services, Inc. “AWS Simple Icons for Architecture Diagrams“. https://aws.amazon.com/architecture/icons/. Retreieved May 19, 2014. • Hyunjoo, Kim et al. “Investigating the Use of Autonomic Cloudbursts for High-Throughput Medical Image Registration”. Retrieved from IEEE Xplore Digital Library. • Leslie, Luke M. et al. “Exploiting Performance and Cost Diversity in the Cloud”. 2013 IEEE Sixth International Conference on Cloud Computing. Retrieved from IEEE Xplore Digital Library. • Li, Yin et al. “H-PFSP: Efficient Hybrid Parallel PFSP Protected Scheduling for MapReduce System”. 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Retrieved from IEEE Xplore Digital Library. • Lu, Peng, et al. "Workload Characteristic Oriented Scheduler for MapReduce". 2012 IEEE 18th International Conference on Parallel and Distributed Systems”. Retrieved from IEEE Xplore Digital Library. • Moschakis, Ioannis A., Karatza, Helen D. “Parallel Job Scheduling on a Dynamic Cloud Model with Variable Workload and Active Balancing”. 2012 16th Panhellenic Conference on Informatics. Retrieved from IEEE Xplore Digital Library. • Moschakis, Ioannis A., Karatza, Helen D. “Performance and Cost evaluation of Gang Scheduling in a Cloud Computing System with Job Migrations and Starvation Handling”. Retrieved from IEEE Xplore Digital Library. • Nahir, Amir, Ariel Orda, and Danny Raz. “Schedule First, Manage Later: Network-Aware Load Balancing”. 2013 Proceedings IEEE INFOCOM. Retrieved from IEEE Xplore Digital Library.