1 / 16

Tools at Scale - Requirements and Experience

Tools at Scale - Requirements and Experience. Mary Zosel, LLNL ASCI / PSE ASCI Simulation Development Environment Tools Project Prepared for SciComp 2000 La Jolla, Ca. Aug 14-16, 2000. UCRL: VG - 139702. Presentation Outline: Overview of Systems Requirements for Scale

jerom
Download Presentation

Tools at Scale - Requirements and Experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools at Scale - Requirements and Experience Mary Zosel, LLNL ASCI / PSE ASCI Simulation Development Environment Tools Project Prepared for SciComp 2000 La Jolla, Ca. Aug 14-16, 2000 UCRL: VG - 139702

  2. Presentation Outline: Overview of Systems Requirements for Scale Experience/Progress in debugging and tuning

  3. ASCI WHITE • 8192 P3 cpu’s • NightHawk 2 nodes • Colony Switch • 12.3 TF peak • 160 TB disk • 28 tractor trailers • Classified Network Full system at IBM 120 nodes in new home at LLNL - remainder due late Aug.

  4. White joins these IBM platforms at LLNL • 128 cpu - SNOW - (8-way P3 NH 1 nodes - Colony) • Experimental software development platform - Unclassified • 1344 cpu - BLUE - (4-way 604e silver nodes / TB3MX) • Production unclassified platform • 16 cpu - BABY - (4-way 604e silver nodes / TB3MX) • Experimental development platform - first stop for new system software • 64 cpu - ER - (4 way 604e silver nodes / TB3MX) • Backup production system “parts” - and experimental software • 5856 cpu - SKY (3 sectors of 488 silver nodes - connected with TB3MX and 6 HPGN IP routers) - Classified production system. • When White is complete - ~2/3 of SKY will become the unclassified production system

  5. Why the big machines? • The purpose of ASCI is new 3-D codes for use in place of testing for Stockpile Certification. • ASCI program plan calls for series of application milepost demonstrations of increasingly complex calculations which require the very large platforms. • Last year- 1000 cpu requirement • This year - 1500 cpu requirement • Next year - ~4000 cpu requirement • Tri-lab resource -> multiple code teams with large scale requirements

  6. What does this imply for development environment?Pressure Stress Pressure • Deadlines: multiple code teams working against time • Long Calculations: need to understand and optimize time requirements of each component to plan for production runs • Large Scale: easy to push past the knee of scalability - and past the Troutbeck US limit of 1024 tasks • Large Memory: n**2 buffer management schemes hurt • Access Contention: not easy to get large test runs - especially for tool work

  7. What Tools are in use?Staying with standards helps make tools usable • Languages/Compilers: • C, C++, Fortran from both IBM and KAI • Runtime: OpenMP and MPI • Production codes not using pvm, shmem, direct LAPI use, etc. and direct use of pthreads is very limited • Debugging / Tuning: • TotalView, LCF, Great Circle, ZeroFault, Guide, Vampir, xprofiler, pmapi / papi, and hopefully new IBM tools

  8. Debugging --- LLNL Experience • Users DO want to use the debugger with large # cpus • There have been lots of frustrations - but there is progress and expectation of further improvements • Slow to attach / start … what was hours is now minutes • Experience / education helps avoid some problems ... • Need large memory settings in ld • Now have MP_SYNC_ON_CONNECT off by default • Set startup timeouts (MP_TIMEOUT) • “Sluggish but tolerable” describes a recent 512 cpu session • Local feature development aimed at scale ... • Subsetting, collapse, shortcuts, filtering, … both CLI and X versions • Etnus continuing to address scalability

  9. New Attach Option to get subset of tasks

  10. Root window collapsed Shows task 4 in different state. Same Root window opened to show all tasks

  11. Cycle thru message state Example of thumb-screw on msg window

  12. Performance … status quo is less promising • MPI scale is an issue - OpenMP reduces problem • Understanding thread performance is issue • Users DO want to use the tools - this is new • They need estimates for their large code runs … • Is my job is running or hung? • Tools aren’t yet ready for scale - including size-of-code scaling • Several tools do not support threads • Problems often not in the user’s code

  13. List of sample problems User observes that … • … as the number of tasks grows, the code becomes relatively slower and slower. The sum of the CPU time and the system time doesn't add up to wall-clock time – and this missing time is the component growing the fastest. [Diagnosis – bad adaptor software configuration was causing excessive fragmentation and retransmission of MPI messages] • … unexplained code slow-down from previous runs and nothing in the code has changed. [Diagnosis – orphaned processes on one node slowed down entire code,] • … threaded version of code much slower than straight MPI. [Diagnosis – code had many small malloc calls and was serializing through the malloc code.] • … certain part of code takes 10 seconds to run while the problem is small – and then after a call to a memory-intensive routine – the same portion of code takes 18 seconds to run. [Diagnosis – not sure – but believed to be memory heap fragmentation causing paging.] • … job runs faster on Blue (604e system) than it does on Snow (P3 system). [Diagnosis – not yet known – wonder about flow-control default setting]. • … a non-blocking message-test code is taking up to 15 times longer to run on Snow than it does on Blue. [Diagnosis - not yet known - flow control setting doesn’t help.]

  14. What are we doing about this? • PathForward contracts: KAI/Pallas, Etnus, MSTI • Infrastructure development: to facilitate new tools / probes • supports click-back to source • currently QT on DPCL … future??? • Probe components: -memory usage, mpi classification • Lightweight CoreFile … and Performance Monitors • External observation … Monitor, PS, VMSTAT … • Testing new IBM beta tools • Sys admins starting performance regression database

  15. MPI Classification Performance Monitor Tools Tool Infrastructure Memory Performance Tool Work In Progress

  16. the faster I go, the behinder I get … we ARE making progress, but the problems are getting harder and coming in faster ... It’s a Team Effort Rich Zwakenberg - debugging Karen Warren Bor Chan John May - performance tools Jeff Vetter John Gyllenhaal Chris Chambreau Mike McCracken John Engle - compiler support Linda Stanberry - mpi related Bronis deSupinski Susan Post - system testing Brian Carnes - general Mary Zosel Scott Taylor - emeritas John Ranelletti

More Related