350 likes | 509 Views
SimMatrix: SIMulator for MAny -Task computing execution fabRIc at eXascale. Ke Wang Data-Intensive Distributed Systems Laboratory Computer Science Department Illinois Institute of Technology April 8 th , 2013 ACM HPC Symposium. Outline. Introduction & Motivation
E N D
SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale Ke Wang Data-Intensive Distributed Systems Laboratory Computer Science Department Illinois Institute of Technology April 8th, 2013 ACM HPC Symposium
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Manycore Computing Pat Helland, Microsoft, The Irresistible Forces Meet the Movable Objects, November 9th, 2007 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale 4 Today (2013): Multicore Computing • O(10) cores commodity architectures • O(100) cores proprietary architectures • O(1000) GPU hardware threads Near future (~2019): Manycore Computing • ~1000 cores/threads commodity architectures
Exascale Computing Top500 Performance Development, http://top500.org/static/lists/2011/11/TOP500_201111_Poster.pdf 5 Today (2013): 10 Petascale Computing • O(100K) nodes • O(1M) cores Near future (~2019): Exascale Computing • ~1M nodes (10X) • ~1B processor-cores/threads (1000X)
Major Challenges of Exascale Computing SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale 6 Memory and Storage • minimizing data movement through the memory hierarchy (e.g. persistent storage, solid state memory, volatile memory, caches, and registers) Concurrency and Locality • harnessing the many magnitude orders of increased parallelism fueled by the many-core computing era (Accelerator, GPU, MIC) Resiliency • making both the infrastructure (hardware) and applications fault tolerant in face of a decreasing mean-time-to-failure (MTTF). Energy and Power • 20MW limitation
MTC: Many-Task Computing • Bridge the gap between HPC and HTC • Applied in clusters, grids, and supercomputers • Loosely coupled apps with HPC orientations • Many activities coupled by file system ops • Many resources over short time periods SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
MTC Middleware • Falkon • Fast and Lightweight Task Execution Framework • http://datasys.cs.iit.edu/projects/Falkon/index.html • Swift • Parallel Programming System • http://www.ci.uchicago.edu/swift/index.php SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Long-Term Aims • Address major exascale computing challenges: • Memory and Storage • Concurrency and Locality • Resiliency • Explore scheduling architecture and techniques to enable MTC at exascale • Analyze, design and implement a distributed data-aware execution fabric (MATRIX) supporting HPC/MTC workloads at exascale • Integrate MATRIX with parallel programming systems (e.g. Swift, Charm++, MapReduce) and with the FusionFS distributed file system SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
This Work’s Contributions • Architect, design and implement a job scheduling system simulator, SimMatrix, at the node/core level • Performance evaluation among SimMatrix, SimGrid and GridSim; evaluation done up to millions of nodes, billions of cores, and tens of billions of tasks • Supports of homogenous/heterogeneous systems, various programming models (HPC/MTC), and scheduling strategies (centralized/distributed/hierarchical) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
OverviewJob Scheduling Systems • Efficiently manage the distributed computing power of workstations, servers, and supercomputers in order to maximize job throughput and system utilization. • Load balancing is critical • Different scheduling strategies • Centralized scheduling hinders the scalability • Hierarchical scheduling has long job turnaround time • Distributed scheduling is a promising approach to exascale • Work Stealing – a distributed scheduling strategy • Starved processors steal tasks from overloaded ones SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
SimMatrix Architecture Submit tasks Submit tasks Client Client Dispatcher Arbitrary Node Figure 1: SimMatrixarchitectures; the left part is the centralized one with a single dispatcher (head node) talking to all compute nodes, the right part is the distributed topology with a dispatcher sitting in each compute node SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Simulations • Continuous time simulation • Abandoned the idea of creating a separate thread per simulated node: we found that on our 48-core system with 256GB of memory, we were limited to 32K threads • Discrete event simulation • Aviable approach (today) to explore scheduling techniques at exascale(millions of nodes and billions of cores) • Created an unique object per simulated node, and converted any behavior (state change) to an event SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
At the Heart of SimMatrixGlobal Event Queue • All events are inserted to the queue, sorted based on the occurrence time ascending • Handle the first event, advance the simulation time and update the event queue • Implemented as red-black tree based “TreeSet” in Java, which ensures Θ(log𝑛 ) time for insert & remove Figure 2: Event State Transition Diagram SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Simulator Features • Node load information • Load: Number of busy cores • Nested hash map groups nodes based on load, provides extremely fast lookup for the next available nodes • Dynamic Task Submission • Aims to reduce the task waiting time, the memory foot-print • Dynamic Poll interval • Exponential backoff to reduce the number of messages and increase speed of simulation SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Implementation • SimMatrix is developed in JAVA • Sun 64-bit JDK version 1.7.0_03 • Code accessible at: • http://datasys.cs.iit.edu/~kewang/software.html • SimMatrix has no other dependencies SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Experiment Environment • Fusion system: • fusion.cs.iit.edu • 48 AMD Opteron cores at 800MHz (Only need one core) • 256GB RAM • 64-bit Linux kernel 2.6.31.5 • Sun 64-bit JDK version 1.7.0_23 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Metrics • Throughput • Number of tasks finished per second. Calculated as total-number-of-tasks/simulation-time. • Efficiency • The ratio between the ideal simulation time of completing a given workload and the real simulation time. The ideal simulation time is calculated by taking the average task execution time multiplied by the number of tasks per core. • CPU Time/Time per task • Memory/Memory per task SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Workloads (Sleep tasks) • Synthetic workloads: • Uniform distribution with average task execution time of 5000s (AVE_5K); also homogeneous workload with all tasks having 1 sec execution time (ALL_1) • Realistic application workloads: • Obtained from real traces taken from running MTC applications on Blue Gene/P over a 17-month period. • 34.8M tasks with the minimum runtime of 0 seconds, maximum runtime of 1469.62 seconds, average runtime of 95.20 seconds, and standard deviation of 188.08 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Validation Validate SimMatrix against the state-of-the-art MTC systems (e.g. Falkon, MATRIX) Simulator makes simplifying assumptions, such as the network. It is also difficult to model communication congestion, resource sharing and the effects on performance, and the variability that comes with real systems. We believe the relatively small differences (2.8% and 5.85%) demonstrate that SimMatrix is accurate enough to produce convincible results (at least at modest scales). SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Resource Requirement up to Exascale1M Nodes, 1B tasks and 10B tasks Memory • Centralized: 14.1GB • Distributed: 142.1GB CPU Time • Centralized: 17.4 hours • Distributed: 162.8 hours Still relatively moderate SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Centralized vs. Distributed Scheduling AVE_5K: efficiency drops to 0.05% for centralized, but remains 90%+ for distributed at exascale ALL_1: centralized saturates at 8 nodes with upper bound throughput of 1000 task/sec, distributed starts to saturate at 32K nodes, and finally achieves throughput of 75M task/sec Reason of saturation: the final stage, work stealing requires too many messages as the system scales up, to the point where the number of messages is saturating either the network and/or processing capacity Solution: set an upper bound of the poll interval; having sufficiently long tasks to amortize the cost of so many messages. (AVE_12 tasks can achieve 90% efficiency at exascale with throughput of 75M task/sec) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
SimMatrix vs. SimGrid and GridSim Comparison: Centralized scheduling Scale: GridSim 256 nodes, SimGrid 65K nodes, SimMatrix 1M nodes Time Per Task: GridSim is increasing, SimGrid keeps constant, SimMatrix decreases and then almost keeps constant Memory Per Task: GridSim and SimGrid are decreasing , then keep constant, SimMatrix keeps decreasing Conclusion: SimMatrix is more resource efficient at large scales SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Application Domains of SimMatrix • Data Centers: large-scale data centers (e.g. Google, Amazon) are composed of thousands of (10 to 100× in near future) servers geographically distributed around the world. Load balancing among all the servers with data-intensive workloads is very important, yet non-trivial. SimMatrix is able to study different network topologies connecting all the servers and data-aware scheduling, which could be applied in scheduling of data centers. • Grid Environment: not only could SimMatrix be configured as homogeneous scheduling system, it can also be tuned into heterogeneous one. Different Grids could configure SimMatrix and do scheduling individually without interaction with each other. SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Application Domains of SimMatrix Workflow System: although SimMatrix relies on high level workflow systems (Swift, Charm++) to manage the data-flow and task dependency now, we could develop SimMatrix to simulate workflow system with dependent tasks. We have already run SimMatrix with MTC workload achieved from Swift workflow system up to exascale, and achieved ~87% efficiency SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Application Domains of SimMatrix Many-core Simulation: instead of configuring SimMatrix as an exascale system, we also configured it as a single many-core chip node up to thousands of cores with 2D/3D mesh topology. We applied work-stealing at the core level within one many-core node, and found that up to thousand cores level, 2D mesh topology needs at least 13 hops of neighbors, while 3D mesh needs at least 5, in order to achieve high system efficiency. SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Related Work • Real Job Scheduling Systems: • Condor (University of Wisconsin), Bradley et al, 2013 • PBS (NASA Ames) , Corbattoet al, 2013 • SLURM (LLNL), Danny et al. 2013 • Falkon (University of Chicago), Raicu et al, SC07 • Job Scheduling System Simulators: • SimJava(University of Edinburgh), Wheeler et al, 2004 (thread-based) • GridSim(University of Melbourne, Australia), Buyyaet al, 2010 (thread-based) • SimGrid (INRIA), Lucas et al, 2013 (Parallel DES) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
Conclusion & Future Work • Conclusion: • Exascale computing will bring several challenges, which need to be solved by new programming models. • MTC could potentially address the exascale challenges, however, efficient job scheduling systems at extreme scales are needed. • SimMatrix is light-weight enough to enable the study of different scheduling strategies and architectures at exascale • Future Work: • Explore different network topologies (fat tree, 3D/4D, InfiniBand) • Work flow and task dependency simulation • Different workloads of both HPC and MTC simulation SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale
More Information • More information: • http://datasys.cs.iit.edu/~kewang/ • http://datasys.cs.iit.edu/projects/SimMatrix/ • Contact: • kwang22@hawk.iit.edu • Questions? SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale