210 likes | 327 Views
What’s Working in HPC. Nicole Wolter, Mike McCracken Allan Snavely, Lorin Hochstein, Taiga Nakamura, Vic Basili DARPA HPCS October 2006. HPC at a glance. Allocations Often multi-site Moving data (and porting code) between sites is often necessary Wide variety of systems
E N D
What’s Working in HPC Nicole Wolter, Mike McCracken Allan Snavely, Lorin Hochstein, Taiga Nakamura, Vic Basili DARPA HPCS October 2006
HPC at a glance • Allocations • Often multi-site • Moving data (and porting code) between sites is often necessary • Wide variety of systems • 8 sites, 20 systems: 32-3,060 CPUs & 0.16 – 16 TFlops • ~10 Architectures, 7 non-linux OSs 7 linux variants • Other systems exist (this is just teragrid.org) • Shared resources • 1000’s of users sharing systems • Systems shoot for >90% utilization • Systems are batch-scheduled*, have time limits, have priority policies • Example: 1024 p job, ~24hr avg wait time on DS
Highly multidisciplinary • What kinds of programs are running? • Simulation • Visualization • Validation • Huge data requirements • Terabyte files are common • Permanent fast storage is scarce
FORTRAN. Really. • Also C, C++, other • Code lasts for decades • Programming models • MPI • Also OpenMP, PGAS languages • Tool support • Dedicated support personnel • Regular system maintenance
Workflow • Copy data in • Wait ? Hrs for data transfer • Submit • wait 8-24 hrs queue, <=18hrs run • Copy data out / archive data (wait) • wait ? hrs for data transfer • Check results • Visualize • wait ? • Analyze
GoalsUnderstand HPC development strategies • Discern common performance evaluation and improvement tactics. • Assess performance enhancing tools. • Assess developers adeptness at predicting performance. • Evaluate Developers knowledge on the Domain Science versus the Computer Science. • Evaluate hybrid versus “purebred” codes. • Developers view on improving HW versus improving SW. • Understand System Usage
Process • Evaluate HPC system logs, and help tickets • Developer Interviews (Consultants, User/Developers) • Consultants: Focused design effort not original developer or scientist • User: Scientist, limited modifications • Developer: Design original code • SDSC Summer Institute Survey
Conjectures • HPC users all have similar concerns and difficulties with productivity. • Users with the largest allocations and the most expertise tend to be the most productive. • A computer science background is crucial to success in performance optimization. • Visualization is not on the critical path to productivity in HPC in most cases. • HPC programmers would require dramatic performance improvements to consider making major structural changes to their code. • Lack of publicity and education is the main roadblock to adoption of performance and parallel debugging tools. • Computational performance is usually the limiting factor for productivity on HPC systems.
Conjecture 1: HPC users all have similar concerns and difficulties with productivity • Assumption • HPC users are a homogenous community • Background • HPC users all come to HPC centers to capitalize on extended resources • Evaluation • Classes of users • Resource Demands • System Usage Trends (flowchart) • Top perceived bottlenecks • Conclusion • Not True
Conjecture 2: Users with the largest allocations and most experience are the most productive • Assumption • The more you know…… • Background • Large allocations get preferential treatment on large systems • Queue priority • Knowledge • Evaluation • Queue Wait Time • Reliability • Porting • Conclusion • Not always True
Conjecture 3: A computer science background is crucial to success in performance optimization • Assumption • Code Developers are computer Scientists • Background • Many HPC users are physical scientists • Evaluation • Project Funding • SAC (Strategic Application Collaboration) • Conclusion • Not True
Conjecture 4: Time to solution is the limiting factor for productivity on HPC systems • Assumptions • Users are motivated to improve performance • Background • Code Maintenance is done by the physical scientist funded to produce scientific results • Evaluation • Satisficing (satisfied with “good enough” performance) • Users request help with running longer jobs, not performance evaluation. • Job logs (job size versus runtime) • Conclusion • Not True
Conjecture 5: Visualization is not on the critical path to productivity in HPC • Assumptions • Visualization is utilized at the end of production cycle • Background • Visualization usage • Validation • Publication • Evaluation • Utilization frequency (flowchart) • Conclusion • Not True
Conjecture 6: HPC programmers would demand dramatic performance improvements to consider major structural changes to their code • Assumptions • People shy away from change • Background • Community Codes last for 10’s of years • Predominant languages used on HPC systems Fortran and C • Evaluation • Compensation in return for Code rewrite (responses vary) • Conclusion • Not True
Conjecture 7: Lack of publicity is the main roadblock to adoption of performance and parallel debugging tools • Assumptions • People use tools to help debug and profile codes to improve performance • Background data • Number of performance tools available • Number of debuggers available • Evaluation • Debuggers • Hard to use • Possibly impossible to scale • Performance Evaluation • Not Critical Path • Top tool used: print statements and timers • Conclusion • Not True
Conclusion • Productivity != Development Time + Runtime Performance • HPC users are heterogeneous • Performance is considered a constraint not a goal • Paper can be found at: http://www.sdsc.edu/PMaC/HPCS/hpcs_productivity.html
EXTRAS • User Classifications • Chart DS Wait times • Chart DS Run times
User Classifications • Classifying Users: • Marquee User: • Large allocations are greater then equal to1,000,000 (SU’s)Service Units, • Maximum job size can be the full system. • VIP because of large allocation. (At SDSC there are currently only 2 users who run on the full system, this will most likely scale up with the push for petascale.) • Normal User: • Allocation approximately 100,000 SU’s. • Job sizes normally range from 64 -256 processors. • Small User: • Small Allocation, not many users accounts. Accounts through AAP (Academic Associates Program). • These accounts are usually less then or equal to 10,000 SUs (usually 300-3000) university course accounts, short in duration. • Benchmarkers and Computer Scientists: • Dynamic system usage. They can run from 1 cpu to full system. Usually not long in duration, either as individual run or attention span to one project. • Usually less then a dozen runs per application.
Wait time Distribution, Grouped by Job Size(DS job logs from January 2003 to April 2006)
Run time Distribution, Grouped by Job Size(DS job logs from January 2003 to April 2006)