120 likes | 250 Views
OS, MESSAGE PASSING & RUNTIME TOOLS. Parallel software promotion philosophy OpenSource - How rosy is the promise? MPI2 - What features? RTS - How far parallel do we need to go? How might OpenSource accelerate new tools? . Panel Comments by Mary Zosel ASCI PSE / ASDE LLNL
E N D
OS, MESSAGE PASSING & RUNTIME TOOLS Parallel software promotion philosophy OpenSource - How rosy is the promise? MPI2 - What features? RTS - How far parallel do we need to go? How might OpenSource accelerate new tools? Panel Comments by Mary Zosel ASCI PSE / ASDE LLNL For Fourth Workshop on Distributed Supercomputers
Philosophy - for promoting parallel simulation development environment • Standards - promote and encourage use • Set high platform software expectations • Software in procurements • ISV support gives portability and 2nd source • Keep academia involved • Need their ideas & need their students • Local prototypes where needed • Preferably partnership with commercial partner • Full local support only as last resort • It’s fun when new - but costly burden later So where in this picture does OpenSource fit ??? It facilitates academia and prototyping, but the support issue is a concern. UCRL-VG-137868
OpenSource - Does it measure up to the promise? Disclaimer --- I haven’t been actively involved in this area, but at second-look, it isn’t as promising as it first seems. There is a lot of good and successful opensource software But there are also red-flags … • One promising OpenSource tool we picked up was so full of use of platform specific “.h” files that we couldn’t make it build anywhere else. • Another OpenSource promise for a key library we were counting on evaporated. • The lawyers are still there - and source release isn’t easy. • All the usual gnu-software restriction issues … • Software-police issues will be interesting … UCRL-VG-137868
MPI2 - What do the users need? • MPI-I/O • Thread - safety … actually need more support than MPI2 gives us • Dynamic process control - starting to get queries about this • Language bindings • They say they want one-sided • Various “abstraction” features (info, error…) UCRL-VG-137868
Runtime tools for 1000s of cpu’s. • Yes - the users are asking for debugger support. • My code seems to be hung - what’s it doing? • My code is growing after a couple of hours why? • Where is all my memory going and why? • (Similar set of questions for performance issues.) • Easy to provide? No … • Tool infrastucture needs to be designed for scalability • Obvious gui and data presentation issues • User debug time ties up resources - another challenge • Access to resources for development- even “on-site” • But there are solutions in the works …e.g. • Variety of collapsing and filtering of data • Macros together with good CLI look promising UCRL-VG-137868
ORIGINAL STRUCT ARRAY Just the values of “val1” struct member UCRL-VG-137868
Sorted array values Checksum of same array UCRL-VG-137868 UCRL-VG-137868
LCB View of task and thread-state can be dumped anytime application is stopped. color code tells how many processes are where. UCRL-VG-137868
Root window collapsed Same Root window opened to show all tasks UCRL-VG-137868
Can set any of the counters to any of it’s settings Set Counter 1 Set Counter 2 Set Counter 3 Set Counter 4 Activate Counters Stop Counters Update Counters Zero Counters ----------------------- Close Window Close All Similar Windows Save Window to File... Reexecute Last Save Window Help Nothing MFLOPS % branch mispredictions L2 Data cache miss rate ------------------------- CPU Cycles Instructions Completed Instruction Cache Misses Integer Instructions Completed Floating Instructions Completed dtlb misses (not speculative) Branch Mispredictions ------------------------- Time Base bit transition Reservations requested Values by thread UCRL-VG-137868
Info about which task is using max and min memory. Memory info about all the tasks Can watch how (and which) tasks grow UCRL-VG-137868
Will OpenSource help RTS tools ? and how? The biggest barrier to more tools - especially from academia - is the problem of no standard interface with parallel runtime environment - no easy way to attach-to and communicate with parallel job. If the parallel OpenSource community could come up with a (simple) scalable parallel control-daemon interface - that would be a big help to opening this area to development. There are several places interested in a parallel-tools infrastructure components “kit” - but this item is the big drawback to portability. UCRL-VG-137868