1 / 35

SimMatrix: SIMulator for MAny -Task computing execution fabRIc at eXascale

SimMatrix: SIMulator for MAny -Task computing execution fabRIc at eXascale. Ke Wang Data-Intensive Distributed Systems Laboratory Computer Science Department Illinois Institute of Technology April 8 th , 2013 ACM HPC Symposium. Outline. Introduction & Motivation

maris
Download Presentation

SimMatrix: SIMulator for MAny -Task computing execution fabRIc at eXascale

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale Ke Wang Data-Intensive Distributed Systems Laboratory Computer Science Department Illinois Institute of Technology April 8th, 2013 ACM HPC Symposium

  2. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  3. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  4. Manycore Computing Pat Helland, Microsoft, The Irresistible Forces Meet the Movable Objects, November 9th, 2007 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale 4 Today (2013): Multicore Computing • O(10) cores commodity architectures • O(100) cores proprietary architectures • O(1000) GPU hardware threads Near future (~2019): Manycore Computing • ~1000 cores/threads commodity architectures

  5. Exascale Computing Top500 Performance Development, http://top500.org/static/lists/2011/11/TOP500_201111_Poster.pdf 5 Today (2013): 10 Petascale Computing • O(100K) nodes • O(1M) cores Near future (~2019): Exascale Computing • ~1M nodes (10X) • ~1B processor-cores/threads (1000X)

  6. Major Challenges of Exascale Computing SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale 6 Memory and Storage • minimizing data movement through the memory hierarchy (e.g. persistent storage, solid state memory, volatile memory, caches, and registers) Concurrency and Locality • harnessing the many magnitude orders of increased parallelism fueled by the many-core computing era (Accelerator, GPU, MIC) Resiliency • making both the infrastructure (hardware) and applications fault tolerant in face of a decreasing mean-time-to-failure (MTTF). Energy and Power • 20MW limitation

  7. MTC: Many-Task Computing • Bridge the gap between HPC and HTC • Applied in clusters, grids, and supercomputers • Loosely coupled apps with HPC orientations • Many activities coupled by file system ops • Many resources over short time periods SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  8. MTC Middleware • Falkon • Fast and Lightweight Task Execution Framework • http://datasys.cs.iit.edu/projects/Falkon/index.html • Swift • Parallel Programming System • http://www.ci.uchicago.edu/swift/index.php SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  9. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  10. Long-Term Aims • Address major exascale computing challenges: • Memory and Storage • Concurrency and Locality • Resiliency • Explore scheduling architecture and techniques to enable MTC at exascale • Analyze, design and implement a distributed data-aware execution fabric (MATRIX) supporting HPC/MTC workloads at exascale • Integrate MATRIX with parallel programming systems (e.g. Swift, Charm++, MapReduce) and with the FusionFS distributed file system SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  11. This Work’s Contributions • Architect, design and implement a job scheduling system simulator, SimMatrix, at the node/core level • Performance evaluation among SimMatrix, SimGrid and GridSim; evaluation done up to millions of nodes, billions of cores, and tens of billions of tasks • Supports of homogenous/heterogeneous systems, various programming models (HPC/MTC), and scheduling strategies (centralized/distributed/hierarchical) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  12. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  13. OverviewJob Scheduling Systems • Efficiently manage the distributed computing power of workstations, servers, and supercomputers in order to maximize job throughput and system utilization. • Load balancing is critical • Different scheduling strategies • Centralized scheduling hinders the scalability • Hierarchical scheduling has long job turnaround time • Distributed scheduling is a promising approach to exascale • Work Stealing – a distributed scheduling strategy • Starved processors steal tasks from overloaded ones SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  14. SimMatrix Architecture Submit tasks Submit tasks Client Client Dispatcher Arbitrary Node Figure 1: SimMatrixarchitectures; the left part is the centralized one with a single dispatcher (head node) talking to all compute nodes, the right part is the distributed topology with a dispatcher sitting in each compute node SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  15. Simulations • Continuous time simulation • Abandoned the idea of creating a separate thread per simulated node: we found that on our 48-core system with 256GB of memory, we were limited to 32K threads • Discrete event simulation • Aviable approach (today) to explore scheduling techniques at exascale(millions of nodes and billions of cores) • Created an unique object per simulated node, and converted any behavior (state change) to an event SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  16. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  17. At the Heart of SimMatrixGlobal Event Queue • All events are inserted to the queue, sorted based on the occurrence time ascending • Handle the first event, advance the simulation time and update the event queue • Implemented as red-black tree based “TreeSet” in Java, which ensures Θ(log⁡𝑛 ) time for insert & remove Figure 2: Event State Transition Diagram SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  18. Simulator Features • Node load information • Load: Number of busy cores • Nested hash map groups nodes based on load, provides extremely fast lookup for the next available nodes • Dynamic Task Submission • Aims to reduce the task waiting time, the memory foot-print • Dynamic Poll interval • Exponential backoff to reduce the number of messages and increase speed of simulation SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  19. Implementation • SimMatrix is developed in JAVA • Sun 64-bit JDK version 1.7.0_03 • Code accessible at: • http://datasys.cs.iit.edu/~kewang/software.html • SimMatrix has no other dependencies SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  20. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  21. Experiment Environment • Fusion system: • fusion.cs.iit.edu • 48 AMD Opteron cores at 800MHz (Only need one core) • 256GB RAM • 64-bit Linux kernel 2.6.31.5 • Sun 64-bit JDK version 1.7.0_23 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  22. Metrics • Throughput • Number of tasks finished per second. Calculated as total-number-of-tasks/simulation-time. • Efficiency • The ratio between the ideal simulation time of completing a given workload and the real simulation time. The ideal simulation time is calculated by taking the average task execution time multiplied by the number of tasks per core. • CPU Time/Time per task • Memory/Memory per task SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  23. Workloads (Sleep tasks) • Synthetic workloads: • Uniform distribution with average task execution time of 5000s (AVE_5K); also homogeneous workload with all tasks having 1 sec execution time (ALL_1) • Realistic application workloads: • Obtained from real traces taken from running MTC applications on Blue Gene/P over a 17-month period. • 34.8M tasks with the minimum runtime of 0 seconds, maximum runtime of 1469.62 seconds, average runtime of 95.20 seconds, and standard deviation of 188.08 SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  24. Validation Validate SimMatrix against the state-of-the-art MTC systems (e.g. Falkon, MATRIX) Simulator makes simplifying assumptions, such as the network. It is also difficult to model communication congestion, resource sharing and the effects on performance, and the variability that comes with real systems. We believe the relatively small differences (2.8% and 5.85%) demonstrate that SimMatrix is accurate enough to produce convincible results (at least at modest scales). SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  25. Resource Requirement up to Exascale1M Nodes, 1B tasks and 10B tasks Memory • Centralized: 14.1GB • Distributed: 142.1GB CPU Time • Centralized: 17.4 hours • Distributed: 162.8 hours Still relatively moderate SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  26. Centralized vs. Distributed Scheduling AVE_5K: efficiency drops to 0.05% for centralized, but remains 90%+ for distributed at exascale ALL_1: centralized saturates at 8 nodes with upper bound throughput of 1000 task/sec, distributed starts to saturate at 32K nodes, and finally achieves throughput of 75M task/sec Reason of saturation: the final stage, work stealing requires too many messages as the system scales up, to the point where the number of messages is saturating either the network and/or processing capacity Solution: set an upper bound of the poll interval; having sufficiently long tasks to amortize the cost of so many messages. (AVE_12 tasks can achieve 90% efficiency at exascale with throughput of 75M task/sec) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  27. SimMatrix vs. SimGrid and GridSim Comparison: Centralized scheduling Scale: GridSim 256 nodes, SimGrid 65K nodes, SimMatrix 1M nodes Time Per Task: GridSim is increasing, SimGrid keeps constant, SimMatrix decreases and then almost keeps constant Memory Per Task: GridSim and SimGrid are decreasing , then keep constant, SimMatrix keeps decreasing Conclusion: SimMatrix is more resource efficient at large scales SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  28. Application Domains of SimMatrix • Data Centers: large-scale data centers (e.g. Google, Amazon) are composed of thousands of (10 to 100× in near future) servers geographically distributed around the world. Load balancing among all the servers with data-intensive workloads is very important, yet non-trivial. SimMatrix is able to study different network topologies connecting all the servers and data-aware scheduling, which could be applied in scheduling of data centers. • Grid Environment: not only could SimMatrix be configured as homogeneous scheduling system, it can also be tuned into heterogeneous one. Different Grids could configure SimMatrix and do scheduling individually without interaction with each other. SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  29. Application Domains of SimMatrix Workflow System: although SimMatrix relies on high level workflow systems (Swift, Charm++) to manage the data-flow and task dependency now, we could develop SimMatrix to simulate workflow system with dependent tasks. We have already run SimMatrix with MTC workload achieved from Swift workflow system up to exascale, and achieved ~87% efficiency SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  30. Application Domains of SimMatrix Many-core Simulation: instead of configuring SimMatrix as an exascale system, we also configured it as a single many-core chip node up to thousands of cores with 2D/3D mesh topology. We applied work-stealing at the core level within one many-core node, and found that up to thousand cores level, 2D mesh topology needs at least 13 hops of neighbors, while 3D mesh needs at least 5, in order to achieve high system efficiency. SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  31. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  32. Related Work • Real Job Scheduling Systems: • Condor (University of Wisconsin), Bradley et al, 2013 • PBS (NASA Ames) , Corbattoet al, 2013 • SLURM (LLNL), Danny et al. 2013 • Falkon (University of Chicago), Raicu et al, SC07 • Job Scheduling System Simulators: • SimJava(University of Edinburgh), Wheeler et al, 2004 (thread-based) • GridSim(University of Melbourne, Australia), Buyyaet al, 2010 (thread-based) • SimGrid (INRIA), Lucas et al, 2013 (Parallel DES) SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  33. Outline • Introduction & Motivation • Long-Term Aims and Contributions • SimMatrix Architecture • Implementation • Evaluation • Related Work • Conclusion & Future Work SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  34. Conclusion & Future Work • Conclusion: • Exascale computing will bring several challenges, which need to be solved by new programming models. • MTC could potentially address the exascale challenges, however, efficient job scheduling systems at extreme scales are needed. • SimMatrix is light-weight enough to enable the study of different scheduling strategies and architectures at exascale • Future Work: • Explore different network topologies (fat tree, 3D/4D, InfiniBand) • Work flow and task dependency simulation • Different workloads of both HPC and MTC simulation SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

  35. More Information • More information: • http://datasys.cs.iit.edu/~kewang/ • http://datasys.cs.iit.edu/projects/SimMatrix/ • Contact: • kwang22@hawk.iit.edu • Questions? SimMatrix: SIMulator for MAny-Task computing execution fabRIc at eXascale

More Related