310 likes | 372 Views
This article presents the Effective System Performance Benchmark results, discussing scalability, durability, and anomalies within a computing setup using various technologies. It provides insights into system performance testing methodology and results analysis.
E N D
Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division thkorde@sandia.gov Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SSS Test Results
Overview • Effective System Performance Benchmark • Scalability • Service Node • Cluster Size • Durability • Anomalies
The Setup • The physical machine • dual processor 3GHz Xeon • 2 GB RAM • FC3 and VMWare 5 • The 4 node VMWare cluster • 1 service node • 4 compute nodes • OSCAR 1.0 on Redhat 9 • The 64 virtual node cluster • 16 WarehouseNodeMonitors running on each compute node
Dual Processor Xeon VMWare service1 - SystemMonitor compute1 compute2 compute3 compute4 NodeMon 2 NodeMon 6 NodeMon 10 NodeMon 14 ... NodeMon n-2 NodeMon 3 NodeMon 7 NodeMon 11 NodeMon 14 ... NodeMon n-1 NodeMon 4 NodeMon 8 NodeMon 12 NodeMon 16 ... NodeMon n NodeMon 1 NodeMon 5 NodeMon 9 NodeMon13 ... NodeMon n-3
Effective System Performance Benchmark • Developed by the National Energy Research Scientific Computing Center • System utilization test, NOT a throughput test • Focused on O/S attributes • launch time, accounting, job scheduling • Constructed to be processor-speed independent • Low resource usage (besides network) • Two variants: Throughput and Multimode • The final result is the ESP Efficiency Ratio
ESP Efficiency Ratio • Calculating the ESP Efficiency Ratio • CPUsecs = sum(jobsize * runtime * job count) • AMT = CPUsecs/syssize • ESP Efficiency Ratio = AMT/observed runtime
ESP2 Efficiency (64 nodes) • CPUsecs = 680251.75 • AMT = 680251.75/64 = 10628.93 • Observed Runtime = 11586.7169 • ESP Efficiency Ratio = 0.9173
Scalability • Service Node Scalability (Load Testing) • Bamboo (Queue Manager) • Gold (Accounting) • Cluster Size • Warehouse scalability (Status Monitor) • Maui scalability (Scheduler)
Access control Meta Meta Meta Security Scheduler Monitor Manager manager Interacts with all components Node System Monitor Accounting Scheduler Configuration & Build Manager Resource Allocation management Job Queue Manager & Monitor Manager User DB Data Migration High Usage User Checkpoint/ File Performance Reports utilities Restart System Communication & I/O Application Environment
Access control Meta Meta Meta Security Scheduler Monitor Manager manager Interacts with all components Node System Monitor Accounting Scheduler Configuration & Build Manager Resource Allocation management Job Queue Manager & Monitor Manager User DB Data Migration High Usage User Checkpoint/ File Performance Reports utilities Restart System Communication & I/O Application Environment
Access control Meta Meta Meta Security Scheduler Monitor Manager manager Interacts with all components Node System Monitor Accounting Scheduler Configuration & Build Manager Resource Allocation management Job Queue Manager & Monitor Manager User DB Data Migration High Usage User Checkpoint/ File Performance Reports utilities Restart System Communication & I/O Application Environment
Access control Meta Meta Meta Security Scheduler Monitor Manager manager Interacts with all components Node System Monitor Accounting Scheduler Configuration & Build Manager Resource Allocation management Job Queue Manager & Monitor Manager User DB Data Migration High Usage User Checkpoint/ File Performance Reports utilities Restart System Communication & I/O Application Environment
Warehouse Scalability • Initial concerns • per process file descriptor (socket) limits • time required to gather status from 1000s of nodes • Discussed with Craig Steffen • had the same concerns • experienced file descriptor limits • resolved with a hierarchical configuration • no tests on large clusters, just simulations
Access control Meta Meta Meta Security Scheduler Monitor Manager manager Interacts with all components Node System Monitor Accounting Scheduler Configuration & Build Manager Resource Allocation management Job Queue Manager & Monitor Manager User DB Data Migration High Usage User Checkpoint/ File Performance Reports utilities Restart System Communication & I/O Application Environment
Scalability Conclusions • Bamboo • Gold • Warehouse • Maui
Durability • What is durability? • A few terms regarding starting and stopping • Easy Tests • Hard Tests
Durability and Other Terms • Durability Testing - examines a software system's ability to react to and recover from failures and conditions external to the system itself. • Warm Start/Stop - an orderly startup/shutdown of the SSS services on a particular node • Cold Start/Stop – a warm start/stop paired with a system boot/shutdown on a particular node
Easy Tests • Compute Node Warm Stop • 30 sec delay between stop and Maui notification • race condition • Compute Node Warm Start • 10 sec delay between start and Maui notification • jobs in the queue do not get scheduled, new jobs do • Compute Node Cold Stop • 30 sec delay between stop and Maui notification • race condition
More Easy Tests • Single Node Job Failure • mpd to queue manager communication • Resource Hog - stress • disk • memory • network
More Easy Tests • Resource Exhaustion • compute node • disk – no failures • service node • disk – gold fails in logging package
Hard Tests • Compute Node Failure/Restore • Current release of warehouse fails to reconnect • Service Node Failure/Restore • Requires a restart of mpd on all compute nodes • Compute Node Network Failure/Restore • 30 sec delay between failure and Maui notification • race condition • 20 sec delay between restore and Maui notification
More Hard Tests • Service Node Network Failure/Restore • 30 sec delay between failure and Maui notification • race condition • 20 sec delay between restore and Maui notification • If outage >10 sec, mpd can't reconnect to computes
Durability Conclusions • Bamboo • Gold • Warehouse • Maui
Anomalies Discovered • Maui • Jobs in the queue do not get scheduled after service node warm restart • If max runtime expires on the last job in the queue, repeated attempts are made to delete it; the account is charged actual runtime + max runtime • Otherwise, the last job in the queue doesn't get charge until another job is submitted • Maui loses connections to other services
More Anomalies • Warehouse • warehouse_SysMon exits after ~8 hrs (current release) • warehouse_SysMon doesn't reconnect to power cycled compute nodes (current release) • Gold • “Quotation Create” pair fails with missing column error • gquote succeeds, glsquote fails with similar error • Spikes CPU usage when gold.db file gets large (>64MB). sqlite problem?
More Anomalies • happynsm • /etc/init.d/nsmup needs a delay to allow the server time to initialize • Is NSM in use at this time? • emng.py throws errors • After a few hundred jobs, errors begin showing up in /var/log/messages • Jobs continue to execute, but slowly without events
Conclusions • Overall scalability is good. Warehouse needs to be tested on a large cluster. • Overall durability is good. Some problems with warehouse have been resolved in the latest development release.
ToDo List • Develop and execute tests for the BLCR module • Retest on a larger cluster • Get the latest release of all the software and retest • Formalize this information into a report