Tools at Scale - Requirements and Experience

Tools at Scale - Requirements and Experience Mary Zosel, LLNL ASCI / PSE ASCI Simulation Development Environment Tools Project Prepared for SciComp 2000 La Jolla, Ca. Aug 14-16, 2000 UCRL: VG - 139702

Presentation Outline: Overview of Systems Requirements for Scale Experience/Progress in debugging and tuning

ASCI WHITE • 8192 P3 cpu’s • NightHawk 2 nodes • Colony Switch • 12.3 TF peak • 160 TB disk • 28 tractor trailers • Classified Network Full system at IBM 120 nodes in new home at LLNL - remainder due late Aug.

White joins these IBM platforms at LLNL • 128 cpu - SNOW - (8-way P3 NH 1 nodes - Colony) • Experimental software development platform - Unclassified • 1344 cpu - BLUE - (4-way 604e silver nodes / TB3MX) • Production unclassified platform • 16 cpu - BABY - (4-way 604e silver nodes / TB3MX) • Experimental development platform - first stop for new system software • 64 cpu - ER - (4 way 604e silver nodes / TB3MX) • Backup production system “parts” - and experimental software • 5856 cpu - SKY (3 sectors of 488 silver nodes - connected with TB3MX and 6 HPGN IP routers) - Classified production system. • When White is complete - ~2/3 of SKY will become the unclassified production system

Why the big machines? • The purpose of ASCI is new 3-D codes for use in place of testing for Stockpile Certification. • ASCI program plan calls for series of application milepost demonstrations of increasingly complex calculations which require the very large platforms. • Last year- 1000 cpu requirement • This year - 1500 cpu requirement • Next year - ~4000 cpu requirement • Tri-lab resource -> multiple code teams with large scale requirements

What does this imply for development environment?Pressure Stress Pressure • Deadlines: multiple code teams working against time • Long Calculations: need to understand and optimize time requirements of each component to plan for production runs • Large Scale: easy to push past the knee of scalability - and past the Troutbeck US limit of 1024 tasks • Large Memory: n**2 buffer management schemes hurt • Access Contention: not easy to get large test runs - especially for tool work

What Tools are in use?Staying with standards helps make tools usable • Languages/Compilers: • C, C++, Fortran from both IBM and KAI • Runtime: OpenMP and MPI • Production codes not using pvm, shmem, direct LAPI use, etc. and direct use of pthreads is very limited • Debugging / Tuning: • TotalView, LCF, Great Circle, ZeroFault, Guide, Vampir, xprofiler, pmapi / papi, and hopefully new IBM tools

Debugging --- LLNL Experience • Users DO want to use the debugger with large # cpus • There have been lots of frustrations - but there is progress and expectation of further improvements • Slow to attach / start … what was hours is now minutes • Experience / education helps avoid some problems ... • Need large memory settings in ld • Now have MP_SYNC_ON_CONNECT off by default • Set startup timeouts (MP_TIMEOUT) • “Sluggish but tolerable” describes a recent 512 cpu session • Local feature development aimed at scale ... • Subsetting, collapse, shortcuts, filtering, … both CLI and X versions • Etnus continuing to address scalability

New Attach Option to get subset of tasks

Root window collapsed Shows task 4 in different state. Same Root window opened to show all tasks

Cycle thru message state Example of thumb-screw on msg window

Performance … status quo is less promising • MPI scale is an issue - OpenMP reduces problem • Understanding thread performance is issue • Users DO want to use the tools - this is new • They need estimates for their large code runs … • Is my job is running or hung? • Tools aren’t yet ready for scale - including size-of-code scaling • Several tools do not support threads • Problems often not in the user’s code

List of sample problems User observes that … • … as the number of tasks grows, the code becomes relatively slower and slower. The sum of the CPU time and the system time doesn't add up to wall-clock time – and this missing time is the component growing the fastest. [Diagnosis – bad adaptor software configuration was causing excessive fragmentation and retransmission of MPI messages] • … unexplained code slow-down from previous runs and nothing in the code has changed. [Diagnosis – orphaned processes on one node slowed down entire code,] • … threaded version of code much slower than straight MPI. [Diagnosis – code had many small malloc calls and was serializing through the malloc code.] • … certain part of code takes 10 seconds to run while the problem is small – and then after a call to a memory-intensive routine – the same portion of code takes 18 seconds to run. [Diagnosis – not sure – but believed to be memory heap fragmentation causing paging.] • … job runs faster on Blue (604e system) than it does on Snow (P3 system). [Diagnosis – not yet known – wonder about flow-control default setting]. • … a non-blocking message-test code is taking up to 15 times longer to run on Snow than it does on Blue. [Diagnosis - not yet known - flow control setting doesn’t help.]

What are we doing about this? • PathForward contracts: KAI/Pallas, Etnus, MSTI • Infrastructure development: to facilitate new tools / probes • supports click-back to source • currently QT on DPCL … future??? • Probe components: -memory usage, mpi classification • Lightweight CoreFile … and Performance Monitors • External observation … Monitor, PS, VMSTAT … • Testing new IBM beta tools • Sys admins starting performance regression database

MPI Classification Performance Monitor Tools Tool Infrastructure Memory Performance Tool Work In Progress

the faster I go, the behinder I get … we ARE making progress, but the problems are getting harder and coming in faster ... It’s a Team Effort Rich Zwakenberg - debugging Karen Warren Bor Chan John May - performance tools Jeff Vetter John Gyllenhaal Chris Chambreau Mike McCracken John Engle - compiler support Linda Stanberry - mpi related Bronis deSupinski Susan Post - system testing Brian Carnes - general Mary Zosel Scott Taylor - emeritas John Ranelletti

Tools at Scale - Requirements and Experience

Tools at Scale - Requirements and Experience

Presentation Transcript

Requirements for Internet Scale Event Notifications

TV Stand Requirements Not to Scale

Wisconsin’s Field Scale Assessment Tools

Small Scale Forestry Tools

Our Experience Running YARN at Scale

X10: Performance and Productivity at Scale

Shark:SQL and Rich Analytics at Scale

Clarke College and The READ Scale Experience

Dual use tools: requirements and design

X10: Performance and Productivity at Scale

Rating Scale Experience:

Search at Scale

Requirements for Simulation and Modeling Tools

Information Requirements and Tools for Screening and Preliminary Assessment

REQUIREMENTS Project management tools

EUD at Scale

Tools and Services at NSLSII

User Experience Tools

Wheel and Tire Tools Sets at Techpro Tools

The usage and the requirements of Truck scale

Requirements for Simulation and Modeling Tools