440 likes | 845 Views
Advanced Cluster Computing Consortium (AC3) First Annual Meeting “Roadmaps to the Future of Cluster Computing” Held at Cornell Theory Center 2 nd June 2000 Meeting Review by Kenji Takeda (ktakeda@soton.ac.uk) School of Engineering Sciences
E N D
Advanced Cluster Computing Consortium (AC3) First Annual Meeting “Roadmaps to the Future of Cluster Computing” Held at Cornell Theory Center 2nd June 2000 Meeting Review by Kenji Takeda (ktakeda@soton.ac.uk) School of Engineering Sciences The author thanks Microsoft Research for their support
Talk Outline Talk Outline • Industry Standard Cluster Computing: R&D to the Enterprise • Future of High Performance Computing: Intel Roadmap • Cluster Computing Roadmap: Dell • Cluster Benchmarks: Dell and CTC • Cluster Computing with Windows 2000: MSR • Cluster Computing Made Easy: New Tools for Scalable Servers and Services (CTC) • Mining Large Databases: Present and Future (CTC) • Performance, Scalability and Future Planes: MSTI • Cluster Computing at NCSA • Panel Sessions • Reflections and Conclusions
Thomas Coleman, Director, Cornell Theory Center AC3 Background • Cornell Theory Center has many years of supercomputing experience • Needed a new mission once IBM SP2 work ended • Support computational science and push boundaries • Formed AC3 with major industry partners: Dell, Intel and Microsoft “Increase the space/domain where large-scale problems of computational science are effectively solved using industry standard cluster computing”
David Lifka, Associate Director, Cornell Theory Center Industry Standard Cluster Computing: R&D to the Enterprise “Cluster computing is ready for Prime Time. It doesn’t have to be hard” – David Lifka, CTC • Proof by example • Installed 256 CPU Dell Velocity Cluster with 64 x quad 550MHz Xeons with Giganet interconnect • Site-installation took 10 hours • Two weeks from installation to full production service • Over 100 Cornell projects now use cluster • Over 60 corporate partners involved • Want to use Windows and move away from UNIX
David Lifka, Associate Director, Cornell Theory Center Industry Standard Solutions • Microsoft Windows NT/2000 • Market volumes drive market in new directions • 80% market is Windows NT/2000 • Administration skill base widely available • Future killer apps • New generation brought up on Windows. Expect high level of feature functionality and more than a command-line interface • Big Iron Supercomputers • 4-5 times more expensive than Windows cluster solution • High maintenance costs • Performance and reliability gap closed
David Lifka, Associate Director, Cornell Theory Center Windows 2000 Issues • Major reliability improvements over NT 4.0 • Windows preserves all aspects of the server market • Deployable across the enterprise • Coordinated development • Desktop to Teraflops with one OS, leading to lower TCO and consistent user interfaces • CTC moving all its services to Windows 2000: • Email, print servers, backup, file servers, web servers, etc…
David Lifka, Associate Director, Cornell Theory Center CTC Systems Growth • AC3 Velocity cluster has spawned huge interest • New clusters coming online: • Velocity+: 64 x dual 733MHz PIII system with Giganet • National Plant Genomics Cluster. 48 CPUs, Gbit ethernet • Social Economics Research Cluster: 32 CPUs.Cheaper than upgrading memory on existing SGI system! Looking to move US National Census data servers to Windows 2000 soon • AFS servers for Windows 2000: 7 x dual PIII systems • 8 serial nodes, Poweredge 2450 servers with 1 Gbyte/node • Testing 16- and 32-way systems (Unisys, Sequent and NEC) • Early Testing of Itanium and Windows 2000 64-bit
Timothy Mattson, Senior Research Scientist, Intel Coroporation Future of High Performance Computing Roadmap: Intel • Intel in supercomputer business for a long time • ASCI Red still world’s fastest machine,PIII upgrade • Changing definition of the supercomputer • 1980s: Vector SMP (all custom components) • 1990s: MPP (COTS CPUs, everything else custom) • 2000s: Clusters (COTS everything) • Why has clustering only now taken off: • PCs have closed performance gap • COTS networking has hit major performance leagues with Gigabit ethernet, Giganet, Myrinet…
Timothy Mattson, Senior Research Scientist, Intel Coroporation Intel Processor Roadmap Itanium highlights 800MHz and up 20 ops/clock cycle 2 Gflops on LINPACK 1000 2.1 Gbytes/s bus for 4-way SMP 128-bit integer and FP registers Madison Deerfield McKinley Itanium Future IA32 Foster Cascades Xeon
Timothy Mattson, Senior Research Scientist, Intel Coroporation COTS Networking: VIA • VIA (Virtual Interface Architecture) spearheaded by Intel, MS and Compaq, and 130 other companies • Setup direct data channel that bypasses the kernel • VIA is here today – mature and stable • VIA has its problems though: • PCI bottleneck, although improving with 2nd generation PCI-66 cards • Targeted at clusters, not mass-market • Infiniband is the future….
Timothy Mattson, Senior Research Scientist, Intel Coroporation COTS Networking: Infiniband • Scalable, high-performance I/O for mass-market • Extend native message passing from CPU Memory SAN and beyond… • Done using Host Channel Adapter (HCA) to different I/O devices, including other nodes • 1st generation devices due Q3 2001 • Probably not best for HPC. Optimised for small-medium (e-business) clusters • Intel aiming “to be the leader in Infiniband for clustering and e-business solutions” “Infiniband is a great hardware implementation of VIA”
Timothy Mattson, Senior Research Scientist, Intel Coroporation Community Cluster Development Kit • Clusters are good for research labs but too fiddly • They are too hard to setup and use, there is little support, too many options with no clear winners, and too many learning curves to climb • Need fully integrated common cluster computing stacks, therefore Intel is supporting the… Community Cluster Computing Development Kit A snapshot of best-known methods, but not a new standard “It’s the software, stupid!”
Reza Rooholamini, Cluster Development, Dell Cluster Computing Roadmap: Dell • Scalable Enterprise Computing • Convergence of High Availability and High Performance Computing • HPC is a building block for SEC: • Firewalls • Application clusters • Data mining engines
Reza Rooholamini, Cluster Development, Dell Dell Cluster Solutions • HPC Product Approach: • Collaborate with universities and research institutes • Partner major component providers • Prototyping, benchmarking and sizing • Case studies and white papers
JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC Cluster Benchmarks (Dell) • 32-CPU Dell test systems: • 8 x Dell 6350 4-way SMPs. Fast ethernet, Gigabit ethernet, Giganet and Myrinet • 16 x Dell 2450 2-way SMPs. Fast ethernet, Gigabit ethernet, Giganet and Myrinet • NAS Parallel benchmarks: • Quad-processor significantly slower (30%) than dual processor. • Single processors faster than dual processor systems • BUT 4-way has best price/performance • Giganet (MPI/Pro) better than Myrinet (MPICH-GM)
JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC Cluster Benchmarks (CTC) • Giganet Bandwidth • 113 mbytes/s using raw Giganet cLAN driver • 87 mbytes/s using MPI/Pro, up to 103 mbytes/s for very large messages • NAS Parallel benchmarks: • LU and BT scales linearly with Giganet, up to 16 nodes with fast Ethernet
JenWei Hsiehi, Cluster Development, Dell and George Coulouris, CTC Real Application Benchmarks (CTC) ops • Protein folding simulations • Windows-based visualisation tools developed, see www.tc.cornell.edu/reports/NIH/ resource/CapBiologyTools • FEM code with 1.5 million degrees of freedom • Superlinear scaling to 128 CPUs with PIII-733MHz and Giganet • Per node CPU utilisation decreases as number of SMP CPUs increases PIII Cluster speedup SP2 processors
Todd Needham, Manager of Research Programs, Microsoft Cluster Computing with Windows 2000 • $3 million+ annual commitment to HPC research • Supported projects include: • MPICH on Windows 2000. Argonne National Labs • NCSA VMI driver for Myrinet and Giganet • Maui scheduler (from Utah). www.cs.byu.edu • UTK: SInRG Grid Environment • Globus. Ported to Windows NT. Working on Windows 2000 support using Active Directory services • Condor scheduler • Parallel visualisation. Kai Li using OpenGL on Windows 2000 • NCSA. High Performance DCOM over VIA
Todd Needham, Manager of Research Programs, Microsoft Enterprise Windows 2000 • Union of HPC and e-business technology • 100% overlap of tools. eg: cluster management • Need to improve out-of-the-box experience. • MS built 800 CPU Celeron 400MHz cluster to test EP applications and DCOM scalability • MSR Cambridge: • Performance prediction tools as runtime component in user application • MS Redmond: • Winsock Direct, data mining, scalable servers
Todd Needham, Manager of Research Programs, Microsoft Future Technologies for Windows HPC • Parallel file systems • Development tools and debuggers • Toolworks and Totalview • Parallel and Scalable commercial applications • Better desktop cluster transparency. eg: Jack Dongarra’s Excel interface to NetSolve • Visual Studio v7. IDE for 3rd party plug-ins • 64-bit Windows 2000
Ken Birman, Professor, Computer Science, Cornell University Cluster Computing Made Easy: New Tools for Scalable Servers and Services • ISIS, HORUS and ENSEMBLE Virtual Synchrony execution model (1987-98) • Groups of processes with multicast comms between them • Notification of failures and rejoins • State transfer, allow addition of nodes to running job • HORUS and ENSEMBLE are modular, with plug & play software components • NYSE, Swiss Stock Exchange • French Air Traffic Control • Next Generation AEGIS System
Ken Birman, Professor, Computer Science, Cornell University QUINTET • Focus on management • e-Business solutions. Huge real clusters managed as single entities, such as Hotmail • Exploit high performance networks • Scalable cluster management • Cluster-aware application development • Enterprise clusters come in many flavours No single management system is suitable for all needs
Ken Birman, Professor, Computer Science, Cornell University 5 Lessons Learned for Scalability • Turn scale to an advantage • Progress under all circumstances • Avoid transparency side at the server side (it always hurts, the last 5% is impossible) • Do not solve all problems in the communications stack • Exploit intelligent, non-portable runtimes
Ken Birman, Professor, Computer Science, Cornell University Quintet Design • Build a component framework for design and construction of cluster management systems • Farm Manager • node membership and failure detection • reliable comms and lightweight state-sharing • Farm Services • Cluster Designer • Tool to construct islands of specialised clusters with farms • Generate cluster profiles • Collection of User Interfaces and Visualisation tools
Ken Birman, Professor, Computer Science, Cornell University Quintet Configuration • Automatic component configuration for core comms • Exploit SANs • Security/secrecy • Failure detection • Membership consensus • Message ordering • Consensus membership (on AC3 Velocity cluster) • Changes: clean 200 s, dirty 500-7000 s • Component membership changes, 50-70 s • Fault tolerant distributed lock manager • Lock acquire: 70-100 s • Node initialisation:400 s for 40,000 locks
Ken Birman, Professor, Computer Science, Cornell University Cluster Profiles • Application development cluster • Process, job, installation and version control • Debug service, distributed logging, MS Visual Studio integration and resource measurement • Game server cluster • 10,000 user Quake server • Client management services, application load request routing, synchronisation, state sharing, shared VM services • Wolfpack/MS Cluster Services compatible profile Quintet first public release (Alpha) in Q3 2000
Johannes Gehrke, Assistant Professor, Computer Science, Cornell University Mining Large Databases: Present and Future • Data mining reaching maturity. • DBMS technology: High availability, maintainability, seamless integration with business processes • Current technology: • Scalable data mining algorithms • Consolidation in the industry • Talks about crossing the chasm
Johannes Gehrke, Assistant Professor, Computer Science, Cornell University Data Mining: Future Technology • Autopilot, automatic algorithms and parameter selection • Privacy, internet may provide first tools for users to control access to data about themselves • Scalability. Market basket data and ‘clickstream’ data. eg: Yahoo logs 2-4 Gbytes/hr to data mine • Data Stream model. • Model maintenance • Change detection • Trend detection, find sequences in slow moving data
Rossen Dimitrov, MPI Software Technology Performance, Scalability, Future Plans: MPI Software Technology • MSTI’s objectives in software design: • Performance • Scalability • Functionality • Ease of Use • Reliability • Robustness • Achieve production quality of support at reasonable price • Mitigate risk, control cost of ownership
Rossen Dimitrov, MPI Software Technology MPI/Pro Features • User-level thread safety • Asynchronous and synchronous completion notification. User runtime switch (½ RTT quoted) • Interrupt driven for lower CPU overhead, higher latency (42 s) • Polling, low latency, higher CPU utilisation (19 s) • Independent message progress • Low CPU overhead, high degree of overlapping • Optimied collective communications, derived datatype, persistent mode of communications • Increased internal concurrency • Multi-driver support: Giganet, SMP and TCP
Rossen Dimitrov, MPI Software Technology MSTI Future Developments • Support Model: • Value proposition is quality and support • Support only model (free downloads available) • Goal is to make cluster computing a business • MPI/Pro • MPI-2 support (2001) • Interconnect configuration tool • Cluster CoNTroller • Time sharing through Windows sessions • Gang scheduling • Windows 2000 Directory Services
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA Cluster Computing at NCSA • NCSA, NSF funded National Center 1986-present • Large number of parallel computer systems • 7 x SGI Origin 2000 systems = 1536 processors • 1 x Exemplar = 64 processors • 256 processor NT supercluster • 100 Windows NT CPUs in test beds and for serial jobs • 100 Tbytes disk store. Generate about 1 Tbyte every 2 weeks • Applications move easily to clusters, due to source level portability
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA Challenges • Technical and application challenges • Compilers, performance tools, MPI debugging • Storage performance, biggest problem as cluster are unbalanced system architectures • Administration tools • Heterogeneous systems • Integration with the Grid • Organisational challenges • Integration with existing infrastructure • Managing user accounts
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA Clusters in the Alliance • Three large clusters for members of the Alliance • NT Supercluster @ NCSA. 256 CPUs • Roadrunner cluster @ University of New Mexico. 512 CPUs • Argonne National Lab IBM cluster. 512 CPUs “Develop locally, run globally” • Local clusters used for development and parameter studies • Require compatible environments for development and job scheduling across Windows and UNIX • Constantly evaluating technologies – OS, CPUs, interconnect, middleware
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA Evolution of Cluster Systems 1600+ cluster CPUs in 2000 192 cluster CPUs in 1998 • Job startup streamlined. From 15 mins (in 1998) for 128 node job to 1 minute now • Significant user requirement for serial nodes • Reliability issues • Windows NT nodes NEVER blue screen • One hardware failure per 100 machines per month • Peripheral failures only, not motherboards or CPUs • Use OpenGL cluster monitor tool to keep track of nodes
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA NCSA Cluster Performance • Quantum Chromo Dynamics memory-intensive code • Memory leaks found in HPVM, now fixed (version 1.9) • 5% slower using dual CPUs than single CPUs • Not suitable for quad-processor systems at all • ARPI3D CFD code • Code has inefficient MPI. Recoded to improve performance • Compute time works well now, MPI part stays constant • I/O is a major bottleneck with this code • NT Scales better for I/O than Linux
Rob Pennington, Technical Program Manager, Cluster Computing, NCSA Clusters Futures • 2000: Teraflop clusters possible with 1000 1GHz IA-32 nodes • 2001: Teraflop machines with around 350 IA-64 nodes (assuming 3 GFlop CPU performance) • Major problem is I/O bottleneck though, and SANs are expensive! • Possible to use I/O nodes, with fibre-channel and Myrinet TCP to cross-mount file systems
High Performance Computing with Clusters: Panel Session I • For big application codes, use Cygwin tools for building (www.sourceware.cygwin.com) • Use scripts to wrap native Windows compilers, make them look like UNIX ones • Can be tedious to get around compiler flag and filename conventions • Wish list: • C++ standard compliance • C++ compiler robustness • Performance and debugging tools
High Performance Computing with Clusters: Panel Session II • Molecular dynamics code users are happy: • Velocity (550 MHz Xeons) 2.2 - 2.4 faster than previous SP2 • Velocity+ (733MHz PIII) 1.3 - 1.4 faster than Velocity cluster • Intel C compiler about 30% faster than MS Visual C++ on stochastic processes code (lots of random number generation) • Windows 2000 runs faster than Windows NT on real applications
Future for HPC Software on Windows Platform: Panel Session I • Open question (from NAG), can Windows provide transparent cluster? Pieces are coming together • Software vendors cite supporting different flavours of Linux as a problem • Intel maintains that HPC is very important to them • Todd Needham of Microsoft speculates: “Windows 2000 on Itanium Rocks!” • Microsoft sees 100% overlap in OS components for Enterprise Computing and HPC
Future for HPC Software on Windows Platform: Panel Session II • MPI Software Technology Inc see many different types of HPC users • Support Windows NT, Windows 2000, and different UNIXes • Different problems with different OSs • Windows: Pinning time for memory higher than Linux. Better security than Linux. Lack of tools on Windows is crippling • Linux: SMP support not great. Many variants a problem • Windows cluster out-of-the-box experience not great • Not many production settings of Windows clusters, so people are not taking it seriously – yet! • Beowulf group has a quasi-community that is strong
Future for HPC Software on Windows Platform: Panel Session III Five-year prognosis for Windows Clusters • Performance + Security + IT + TCO issues prevail • Bright future with a level playing field. Good for competition • Academia will be biased towards using Linux • Outside Academia will be more Windows 2000 oriented • User Beware! Petaflop computing will need a new paradigm though, to supersede MPI
Reflections and Conclusions • Cornell Theory Centre has demonstrated Industry Standard Windows Clusters by example • Performance is as good, or better than Big Iron • HPC is becoming mainstream as a business tool • Convergence in hardware and software between e-business/Enterprise Computing and HPC • Cluster management software is maturing fast • Lack of software development tools is a key problem
More Information… For more information about Cornell Theory Centre Advanced Cluster Consortium (AC3) see: http://www.tc.cornell.edu/ For more information about Windows Clusters in general see: http://www.windowsclusters.org