What’s Working in HPC

What’s Working in HPC Nicole Wolter, Mike McCracken Allan Snavely, Lorin Hochstein, Taiga Nakamura, Vic Basili DARPA HPCS October 2006

HPC at a glance • Allocations • Often multi-site • Moving data (and porting code) between sites is often necessary • Wide variety of systems • 8 sites, 20 systems: 32-3,060 CPUs & 0.16 – 16 TFlops • ~10 Architectures, 7 non-linux OSs 7 linux variants • Other systems exist (this is just teragrid.org) • Shared resources • 1000’s of users sharing systems • Systems shoot for >90% utilization • Systems are batch-scheduled*, have time limits, have priority policies • Example: 1024 p job, ~24hr avg wait time on DS

Highly multidisciplinary • What kinds of programs are running? • Simulation • Visualization • Validation • Huge data requirements • Terabyte files are common • Permanent fast storage is scarce

FORTRAN. Really. • Also C, C++, other • Code lasts for decades • Programming models • MPI • Also OpenMP, PGAS languages • Tool support • Dedicated support personnel • Regular system maintenance

Workflow • Copy data in • Wait ? Hrs for data transfer • Submit • wait 8-24 hrs queue, <=18hrs run • Copy data out / archive data (wait) • wait ? hrs for data transfer • Check results • Visualize • wait ? • Analyze

GoalsUnderstand HPC development strategies • Discern common performance evaluation and improvement tactics. • Assess performance enhancing tools. • Assess developers adeptness at predicting performance. • Evaluate Developers knowledge on the Domain Science versus the Computer Science. • Evaluate hybrid versus “purebred” codes. • Developers view on improving HW versus improving SW. • Understand System Usage

Process • Evaluate HPC system logs, and help tickets • Developer Interviews (Consultants, User/Developers) • Consultants: Focused design effort not original developer or scientist • User: Scientist, limited modifications • Developer: Design original code • SDSC Summer Institute Survey

Conjectures • HPC users all have similar concerns and difficulties with productivity. • Users with the largest allocations and the most expertise tend to be the most productive. • A computer science background is crucial to success in performance optimization. • Visualization is not on the critical path to productivity in HPC in most cases. • HPC programmers would require dramatic performance improvements to consider making major structural changes to their code. • Lack of publicity and education is the main roadblock to adoption of performance and parallel debugging tools. • Computational performance is usually the limiting factor for productivity on HPC systems.

Conjecture 1: HPC users all have similar concerns and difficulties with productivity • Assumption • HPC users are a homogenous community • Background • HPC users all come to HPC centers to capitalize on extended resources • Evaluation • Classes of users • Resource Demands • System Usage Trends (flowchart) • Top perceived bottlenecks • Conclusion • Not True

Conjecture 2: Users with the largest allocations and most experience are the most productive • Assumption • The more you know…… • Background • Large allocations get preferential treatment on large systems • Queue priority • Knowledge • Evaluation • Queue Wait Time • Reliability • Porting • Conclusion • Not always True

Conjecture 3: A computer science background is crucial to success in performance optimization • Assumption • Code Developers are computer Scientists • Background • Many HPC users are physical scientists • Evaluation • Project Funding • SAC (Strategic Application Collaboration) • Conclusion • Not True

Conjecture 4: Time to solution is the limiting factor for productivity on HPC systems • Assumptions • Users are motivated to improve performance • Background • Code Maintenance is done by the physical scientist funded to produce scientific results • Evaluation • Satisficing (satisfied with “good enough” performance) • Users request help with running longer jobs, not performance evaluation. • Job logs (job size versus runtime) • Conclusion • Not True

Conjecture 5: Visualization is not on the critical path to productivity in HPC • Assumptions • Visualization is utilized at the end of production cycle • Background • Visualization usage • Validation • Publication • Evaluation • Utilization frequency (flowchart) • Conclusion • Not True

Conjecture 6: HPC programmers would demand dramatic performance improvements to consider major structural changes to their code • Assumptions • People shy away from change • Background • Community Codes last for 10’s of years • Predominant languages used on HPC systems Fortran and C • Evaluation • Compensation in return for Code rewrite (responses vary) • Conclusion • Not True

Conjecture 7: Lack of publicity is the main roadblock to adoption of performance and parallel debugging tools • Assumptions • People use tools to help debug and profile codes to improve performance • Background data • Number of performance tools available • Number of debuggers available • Evaluation • Debuggers • Hard to use • Possibly impossible to scale • Performance Evaluation • Not Critical Path • Top tool used: print statements and timers • Conclusion • Not True

Conclusion • Productivity != Development Time + Runtime Performance • HPC users are heterogeneous • Performance is considered a constraint not a goal • Paper can be found at: http://www.sdsc.edu/PMaC/HPCS/hpcs_productivity.html

EXTRAS • User Classifications • Chart DS Wait times • Chart DS Run times

User Classifications • Classifying Users: • Marquee User: • Large allocations are greater then equal to1,000,000 (SU’s)Service Units, • Maximum job size can be the full system. • VIP because of large allocation. (At SDSC there are currently only 2 users who run on the full system, this will most likely scale up with the push for petascale.) • Normal User: • Allocation approximately 100,000 SU’s. • Job sizes normally range from 64 -256 processors. • Small User: • Small Allocation, not many users accounts. Accounts through AAP (Academic Associates Program). • These accounts are usually less then or equal to 10,000 SUs (usually 300-3000) university course accounts, short in duration. • Benchmarkers and Computer Scientists: • Dynamic system usage. They can run from 1 cpu to full system. Usually not long in duration, either as individual run or attention span to one project. • Usually less then a dozen runs per application.

Wait time Distribution, Grouped by Job Size(DS job logs from January 2003 to April 2006)

Run time Distribution, Grouped by Job Size(DS job logs from January 2003 to April 2006)

What’s Working in HPC

What’s Working in HPC

Presentation Transcript

Working at electrical parts

Working Capital Mgt.

making a success of agile working

WORKING CAPITAL ASSESSMENT

Working with individuals who self-harm

Timeline – Choice of Law in International Contracts

Global Privacy and Information Quality Working Group

Working at Heights - Ladders

CHAPTER 14 Working Capital Management

FACES OF POVERTY: THE WORKING POOR

Module - 4

Virtual Working

The Working Memory Model

Welcome

COLD WORKING

CHAPTER 16 Working Capital Management

Understanding integrated working

CHAPTER 15 Working Capital Management

Working at height Walking/ working on a roof

Focus on Working Capital

Working, working, working … at Stanford University