1 / 23

High Performance Computing, Clusters, and Productivity

High Performance Computing, Clusters, and Productivity. HPC Market Reality. The market represented by high performance computing can represent a sustainable business provided effective leveraging of commodity technologies

jpoynter
Download Presentation

High Performance Computing, Clusters, and Productivity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Computing, Clusters, and Productivity LNXI SOS8 Presentation - April 04

  2. HPC Market Reality • The market represented by high performance computing can represent a sustainable business provided effective leveraging of commodity technologies • The success of Linux clusters and their rapid growth is not only due to their use in the scientific community, but their growing use by commercial entities for day to day production • History has shown that special purpose architectures targeted solely for the HPC community are generally not widely adopted by the commercial world due their high price/productivity • System Productivity must be considered before, during and after installation - designed into the architecture and support models • Vendor involvement, especially for Linux clusters, is critical throughout life of system for maintaining productivity LNXI SOS8 Presentation - April 04

  3. The technology was compelling… BBC (03/02/69) - The supersonic airliner, Concorde, has made a "faultless" maiden flight. The Anglo-French plane took off from Toulouse and was in the air for just 27 minutes before the pilot made the decision to land. LNXI SOS8 Presentation - April 04

  4. Associated Press (10/24/03)- It's both a technological marvel and financial failure and today the bottom line wins as the Concorde makes its final flight. …but eventually the bottom line won LNXI SOS8 Presentation - April 04

  5. On the other hand… LNXI SOS8 Presentation - April 04

  6. Southwest is not blinking lights Bob LNXI SOS8 Presentation - April 04

  7. Two Different Approaches… Concorde • Average Roundtrip Fares • $10,700 • Avg. Customers Served/Year • 93,000 • Average Miles/Year • Concorde: 11.1 million Southwest • Average Roundtrip Fares • $90.03 • Avg. Customers Served/Year • 45,200,000 • Average Miles/Year • 72 million …Two Very Different Measurements of Productivity? LNXI SOS8 Presentation - April 04

  8. Common Metrics • Top500.org (linpack) • Performance does not equate to productivity • TCO • Something may be inexpensive to own but may not be productive • What matters • Price/Productivity: Effective productivity over time for the total price involved LNXI SOS8 Presentation - April 04

  9. Productivity • Productivityis the ratio between the amount of goods or services produced and the resource or expense that goes into producing them • Productivity implies the ratio of Price/Product • Product: A successful run of the code the system was purchased for • Price: The total cost of the system over its lifetime (including (software and hardware) acquisition, system cost, support, development, infrastructure, and labor) LNXI SOS8 Presentation - April 04

  10. Focus on Productivity • Recent Panel at SC2003: HPC Productivity • The productivity of a HPC system is measured by factors that may not be associated with hardware speed. These factors include program execution time, as well as software development time and other direct and indirect costs. • David Kuck, Manager of the Software and Solutions Group at Intel, "PR (getting on to the top 500 list) tends to make people ignore real productivity issues.“ • DARPA/IPTO’s High Productivity Computing Systems • Goal: • Provide a new generation of economically viable high productivity computing systems for the national security and industrial user community (2007 – 2010) LNXI SOS8 Presentation - April 04

  11. System Architecture and Optimization • It’s all about the application! • The system maps to the application • No system is “one size fits all” • Analysis of application(s) requirements • Memory • Inter-process communications • Bandwidth • Latency • Floating Point/Integer needs • Existing/New codes • Parallelism • …… • Identification of bottlenecks to generate optimal system design/selection/price • System design trades need to optimized for best resulting price/productivity • Cluster or SMP? Why not both! LNXI SOS8 Presentation - April 04

  12. System Architecture: Linux Clusters being used in large scale Production Systems Los Alamos - Pink 10 TFLOPS Linux Networx E2 2,048 Intel Processors Los Alamos - Lightning 11.26 TFLOPS Linux Networx Evolocity 2,816 Opteron Processors Lawrence Livermore - MCR 11.2 TFLOPS Linux Networx E2 2,304 Intel Processors LNXI SOS8 Presentation - April 04

  13. System Architecture: Linux Clusters Being Used in Multi-Application, Scientific Computing Production Environments As part of the Technology Insertion 2004 (TI-04) program, the Department of Defense High Performance Computing Modernization Program (HPCMP) selects Linux Networx for the Army Research Laboratory Major Shared Resource Center’s (MSRC) 2,132-processor Linux cluster. When the system is fully deployed in mid-2004, this Evolocity II cluster will be the HPCMP’s largest deployment of an Intel processor-based Linux cluster, and the solution will adopt Intel 64-bit extension technology. LNXI SOS8 Presentation - April 04

  14. System Architecture: Leveraging the Commodity • A company focused on scientific computing will need to leverage commodity technology to be viable • Market isn’t large enough to afford non-recurring engineering dedicated to advance niche technology • Government funding is not what it used to be • Need to take advantages of economies of scale • Focus development and resources to fill critical gaps and/or provide the “glue” • When using commodity technologies, Systems Engineering is absolutely critical • Understanding of all subsystems • Disciplined engineering approach • System integration and test • Keeping up with technology • Leverage both commodity hardware and software LNXI SOS8 Presentation - April 04

  15. Supporting Software and Tools Supporting Software and Tools • In-house and vendor expertise • Stability and supportability • Integration and compatibility of tools • Versions and version compatibility • Profilers and tools that provide insight enabling re-structuring of code for optimal performance • System management and administration tools Programmability • Programming a cluster is probably now as easy, if not easier than programming an SMP or vector machine • Programming tools and expertise yield large variations in Price/Productivity • Compilers • Variations in compiling, both in compiler selection but also in configuration and optimization, yield 5%-30% differences in performance • Debugging • Having correct tools and proper insight minimizes time to production LNXI SOS8 Presentation - April 04

  16. Installation and Acceptance • Minimizing impact on ongoing operations • Seamless fit and integration with existing infrastructure • Facility • Network • Storage • Getting the system up, running, and producing • Meeting all acceptance criteria • Example: • Vendor “deploys” system but takes 4 months to get working and producing output (4 months out of a 3 year life cycle = 11% reduction in production) LNXI SOS8 Presentation - April 04

  17. Hardware and Software System Maintenance • Hardware • Architecture should minimize downtime • Rapid turnaround from vendor • Upgrades • Scalability • Software • Operating System, Middleware, Drivers, Application level • Open Source • Linux offers no single “throat to choke” • Upgrades • Double Edged Sword • Performance • Risks • Downtime • Revision synchronization Most people can get the system to work the first time… LNXI SOS8 Presentation - April 04

  18. Porting/Optimizing Applications • In-house codes • Third Party Commercial codes • Performance Modeling • Examining algorithm structure to enable re-architecture of code • Can yield significant performance deltas – Observed up to %1000 • Optimization • Re-structuring • Compiling Hardware/Operations/Applications Ratio: $1 / $0.75 / $1.75 Most people spend $$$ on system that could have been better spent on optimizing their software. LNXI SOS8 Presentation - April 04

  19. Education • Who • Administrators • It is easier to teach Linux to a system administrator than to teach system administration to a “Linux person” • Application Engineers • End-Users • What • Operating System • Compilers • Tools • When • As soon as the decision is made to consider a Linux cluster • Why • Directly impacts productivity. LNXI SOS8 Presentation - April 04

  20. Criticality of Vendor Involvement On Productivity • More important for HPC than many other market areas • Always pushing leading edge • Mainly unique systems and applications • But it’s is essential for clusters • It’s not just your vendor, it’s your vendor’s vendors because clusters are assembly of many components • Knowledge, experience and relationship with all components critical • Hardware • Interconnects, Storage, Processors • Software • Operating system, Compilers, Tools, Management, Filesystems LNXI SOS8 Presentation - April 04

  21. Criticality of Vendor Involvement On Productivity • Vendor needs to be engaged over the lifetime of the system • Pre-Installation • Working with customer and component suppliers to provide optimal architecture for target application • Facility impact and design • Project planning and Integration • Installation • Minimize impact on ongoing operations • Time to production • Post-Installation • Training and Education • Availability of experience and skills • Ability to pull together component vendors to resolve issues and provide service • Vendor participation to ensure productivity (avoid dump and run) LNXI SOS8 Presentation - April 04

  22. Rigorous System Engineering, Q/A, and validation process with every cluster system Full pre-ship system build up and testing, followed by rapid on-site installation Delivering complete systems with optimized applications, the latest cluster technologies and open source tools Total Cluster Management from one interface Cluster Services, support and Linux cluster training Linux Networx Value Linux Networx provides cluster computing systems that deliver maximum sustained performance and high return on investment. We achieve high customer satisfaction by delivering Five Points of Proven Value: LNXI SOS8 Presentation - April 04

  23. Conclusion • Linux clusters’ rapid growth in the high performance computing and commercial communities is due to high productivity at a low price point • System Productivity must be considered before, during and after installation - designed into the architecture and support models • High performance computing production class clusters require a vendor with strong systems engineering, firm component vendor relationships, and committed involvement throughout the life of the system LNXI SOS8 Presentation - April 04

More Related