150 likes | 311 Views
Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks. Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002. Mapping/Adapting Distributed Applications on Networks. Model. Data. Sim 2. Vis.
E N D
Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks Jaspal Subhlok Shreenivasa Venkataramaiah Amitoj Singh University of Houston Heterogeneous Computing Workshop, April 15, 2002
Mapping/Adapting Distributed Applications on Networks Model Data Sim 2 Vis Sim 1 Pre Stream ? Application Network
m-1 m-3 Automatic node selection Select 4 nodes for execution : Choice is easy m-6 Congested route Busy nodes m-7 m-5 m-8 m-4 selected nodes Compute nodes Routers m-2
m-1 m-3 Automatic node selection Select 5 nodes: choice depends on application m-6 Congested route Busy nodes m-7 m-5 m-8 m-4 selected nodes Compute nodes Routers m-2
Mapping/Adapting Distributed Applications on Networks Model Data Sim 2 Vis Pre Sim 1 Stream ? Application Network • Discover application characteristics and model performance in a shared heterogeneous environment • Discover network structure and available resources (e.g., NWS, REMOS) • Algorithms to map/remap applications to networks
Methodology for Building Application Performance Signature Performance signature = model to predict application execution time under given network conditions • Execute the application on a controlled testbed • Measure system level activityduring execution • such as CPU, communication and memory usage • Analyze and discover program level activity(message sizes, sequences, synchronization waits) • Develop a performance signature • No access to source code/libraries assumed
Discovering application characteristics Benchmarking on a controlled testbed and analysis Model as a Performance Signature Executable Application Code ethernet switch (crossbar) 100 Mbps links • capture patterns of CPU loads and traffic during execution 500MHz Pentium Duos
Results in this paper Measure performance with resource sharing Executable Application Code Benchmarking on a controlled testbed ethernet switch (crossbar) 100 Mbps links • capture patterns of CPU loads and traffic during execution 500MHz Pentium Duos Demonstrate that measured resource usage on a testbed is a good predictor of performance on a shared network for NAS benchmarks
Experiment Procedure • Resource utilization of NAS benchmarks measured on a dedicated testbed • CPU probes based on “top” and “vmstat” utility • Bandwidth using “iptraf”, “tcpdump”, SNMP queries • Performance of NAS benchmark measured with competing loads and limited bandwidth • Employ dummynet and NISTnet to limit bandwidth • All measurements presented are on 500MHz Pentium Duos, 100 Mbps network, TCP/IP, FreeBSD • All results on Class A, MPI, NAS Benchmarks
Discovered Communication Structure of NAS Benchmarks 1 1 1 0 0 0 2 2 3 3 3 2 BT CG IS 1 1 1 0 0 0 2 2 2 3 3 3 LU MG SP 1 0 2 3 EP
Performance with competing computation loads • Increase beyond 50% due to lack of coordinated (gang) scheduling and synchronization • Correlation between low CPU utilization and smaller increase in execution time (e.g. MG shows only ~60% CPU utilization) • Execution time is lower if least busy node has a competing load (20% difference in the busyness level for CG)
Performance with Limited Bandwidth (reduced from 100 to 10Mbps) on one link Close correlation between link utilization and performance with a shared or slow link
Performance with Limited Bandwidth (reduced from 100 to 10 Mbps) on all links Close correlation between total network traffic and performance with all shared or slow links
Results and Conclusions (not the last slide) • Computation and communication patterns can be captured by passive, near non-intrusive, monitoring • Benchmarked resource usage pattern is a strong indicator of performance with sharing • strong correlation between application traffic and performance with low bandwidth links • CPU utilization during normal execution a good indicator of performance with node sharing Synchronization and timing effects were not dominant for NAS Benchnmarks
Discussion and Ongoing Work (the last slide) • Capture application level data exchange pattern from network probes (e.g. MPI message sequence, sizes) • slowdown different for different message sizes • Infer the main synchronization/waiting patterns • Impact of unbalanced execution and lack of gang scheduling • Capture impact of CPU scheduling policy for accurate prediction with sharing • Policies try to compensate for waits Goal is to build a quantitative “performance signature” to estimate execution time under any given network conditions, and use it in a resource management prototype system