The IEEE CS Task Force on Cluster Computing (TFCC)

The IEEE CS Task Force on Cluster Computing (TFCC) William GroppMathematics and Computer ScienceArgonne National Labwww.mcs.anl.gov/~gropp Thanks to Mark BakerUniversity of Portsmouth, UKhttp://www.dcs.port.ac.uk/~mab

A Little History • In 1998 there was obvious huge interest in clusters, so it seemed natural to set up a focused group in this area. • A Cluster Computing Task Force was proposed to the IEEE CS. • The TFCC was approved and started operating in February 1999 – been going just over 2 years. gropp@mcs.anl.gov

Proposed Activities • Act as an international forum to promote cluster computing research and education, and participate in setting up technical standards in this area. • Be involved with issues related to the design, analysis and development of cluster systems as well as the applications that use them. • Sponsor professional meetings, produce publications, set guidelines for educational programs, and help co-ordinate academic, funding agency, and industry activities. • Organize events and hold a number of workshops that would span the range of activities sponsored by the Task Force. • Publish a bi-annual newsletter to help the community keep abreast of activities in field. gropp@mcs.anl.gov

IEEE CS Task Forces • A TF is expected to have a finite term of existence, normally a period of 2-3 years - continued existence beyond that point is generally not appropriate. • A TF is expected to either increase their scope of activities such that establishment of a Technical Committee (TC) is warranted, or the task force will be merged into existing TCs. • TFCC will submit an application to the CS become a TC later this year. gropp@mcs.anl.gov

Why a separate TFCC! • It brings together all the activities/technologies used with Cluster Computing into one area - so instead of tracking four or five IEEE TCs there is one... • Cluster Computing is NOT just Parallel, Distributed, OSs, or the Internet, it is a mix of them all, and consequently different. • The TFCC is an appropriate body for focusing activities and publications associated with Cluster Computing. gropp@mcs.anl.gov

http://www.ieeetfcc.org gropp@mcs.anl.gov

TFCC Mailing Lists • Currently three emails lists have been set up: • tfcc-l@bucknell.edu – a discussion list open to anyone interested in the TFCC - see TFCC page for info. on “how to subscribe”. • tfcc-exe@port.ac.uk– a closed executive committee mailing reflector. • tfcc-adv@port.ac.uk– a closed advisory committee mailing reflector. gropp@mcs.anl.gov

Annual Conference – ClusterXY • 1st IEEE International Workshop on Cluster Computing (Cluster 1999), Melbourne, Australia, December 1999, about 105 attendees from 16 countries. http://www.clustercomp.org • 2nd IEEE International Conference on Cluster Computing (Cluster 2000), Chemnitz, Germany, November, 2000, anticipate 160 attendees. http://www.tu-chemnitz.de/cluster2000 • 3rd IEEE International Conference on Cluster Computing (Cluster 2001), Newport Beach, California, October 8-11, 2001, expect 250-300 attendees. http://andy.usc.edu/cluster2001 gropp@mcs.anl.gov

Associated Events - GRID’XY • 1st IEEE/ACM International Workshop on Grid Computing (Grid2000), Bangalore, India, December 17, 2000 (attendees from 15 countries). http://www.gridcomputing.org • 2nd IEEE/ACM International Workshop on Grid Computing (Grid2001), at SC2001, November 2001 gropp@mcs.anl.gov

Supercomputing • “Birds of A Feather” at SC99 and SC2000. • Aims of meetings are to gather together interested parties and bring them up to date, but also put together a bunch of short talks and start a discussion on a variety of topics… • Probably be another at SC01 – depending on the community interest. gropp@mcs.anl.gov

Other Activities • Book donation program • Cluster Computing Archive • www.ieeetfcc.org/ClusterArchive.html • TopClusters Project • www.TopClusters.org • TFCC Whitepaper • www.dcs.port.ac.uk/~mab/tfcc/WhitePaper • TFCC Newsletter • www.eg.bucknell.edu/~hyde/tfcc gropp@mcs.anl.gov

TopClusters Project • http://www.TopClusters.org • TFCC collaboration with Top500 project. • Numeric, I/O, Web, Database, and Application level benchmarking of clusters. • Joint BOF with Top500 at SC2000 on Cluster-based benchmarking. • Ongoing effort… gropp@mcs.anl.gov

TFCC Whitepaper • A Whitepaper on Cluster Computing, submitted to the International Journal of High-Performance Applications and Supercomputing, November 2000 • Snap-shot of the state-of-the-art of Cluster Computing. • Preprint, www.dcs.port.ac.uk/~mab/tfcc/WhitePaper/ gropp@mcs.anl.gov

TFCC Membership • Over 300 registered members • Free membership open to all, but few benefits may be restricted - (reduced registration fee for IEEE members) • Over 450 on the TFCC mailing list <tfcc-l@bucknell.edu> gropp@mcs.anl.gov

Future Plans • We plan to submit an application to the IEEE CS Technical Activities Board (TAB) to attain full Technical Committee status. • The TAB see the TFCC as a success and we hope that our application will be successful. • Obviously if we achieve TC status, we will need the continuing assistance and help of the TFCCs current volunteers plus encourage a bunch of new ones… gropp@mcs.anl.gov

Summary • Successful conference series has been started, with commercial sponsorship. • Promoting Cluster-based technologies through TFCC sponsorship. • Helping the community with our book donation program. • Engendering debate and discussion through mailing forum. • Keeping the community informed with our information rich TFCC Web site. gropp@mcs.anl.gov

Scalable Clusters • TopCluster.org list: • 26 Clusters with 128+ nodes • 8 with 500+ nodes • 34 with 64-127 nodes • Most run Linux • Most dedicated to applications • Where are scalable tools developed and tested? • Caveats: • Does not include MPP-like systems (IBM SP, SGI Origin, Compaq, Intel TFLOPs, etc.) • Not a complete list • Only clusters explicitly contributed to topcluster.org gropp@mcs.anl.gov

What is Scalability? • Most common definition in use: • Works for n+1 nodes if it works for n, for small n • Practical definition • Operations complete “fast enough” • 0.5 to 3 seconds for “interactive” • Operations are reliable • Approach to scalability must not be fragile gropp@mcs.anl.gov

Issues in Clusters and Scalability • Developing and Testing Tools • Requires convenient access to large-scale system • Can this co-exist with production computing? • Too many different tools • Why not adopt Unix philosophy? • Example solution: Scalable Unix Tools • Following slides thanks to Rusty Lusk and Emil Ong gropp@mcs.anl.gov

What Are the Scalable Unix Tools? • Parallel versions of common Unix commands like ps, ls, cp, …, with appropriate semantics • A few new commands in the same spirit but without a serial counterpart • Designed for users • New this spring: release of a high-performance implementation based on MPI • One of the original “official” Ptools projects • Original definition published • Proceedings of the Scalable High Performance Computing Conference • http://www.mcs.anl.gov/~gropp/papers/1994/shpcc-paper.ps gropp@mcs.anl.gov

Motivation • Basic Unix commands (ls, grep, find, …) are quintessential tools. • Simple syntax and semantics (except maybe find syntax) • Have same component interface (lines of text, stdin, stdout) • Unix redirection ( <, >, and especially | ) allow tools to be easily combined into powerful command lines • “Old-fashioned”: no GUI, little interactivity gropp@mcs.anl.gov

Motivation, continued • Many parallel machines have Unix and at least partially distinct file systems on each node. • A user needs simple and familiar ways to • Copy a file to local file space on each node • Find all processes running on all nodes • Test for conditions on all nodes • Avoid getting swamped with output • On large machines these commands are not useful unless they take advantage of parallelism in their execution. gropp@mcs.anl.gov

Design Goals • Familiar to Unix users • Similar names (we chose pt<Unix-name>) • Same arguments, similar semantics • Interact well with traditional Unix commands, facilitating construction of powerful command lines • Run at interactive speeds (requires scalability in parallel process manager startup and handling of I/O) gropp@mcs.anl.gov

ptcp ptmv ptrm ptln ptmkdir ptrmdir ptchmod ptchgrp ptchown pttest[ao] Part I: Parallel Versions of Traditional Commands • Select nodes to run on by • -all • -m <file of hostnames> • -M <hostlist> • ‘donner dasher blitzen’ • ‘ccn%d@1-32,42,65-96’ gropp@mcs.anl.gov

Part II: Traditional Commands Producing Lots of Output • ptcat, ptls, ptfind • Have potential to produce lots of output, and the source is also of interest • With –h option: ptls –M node%d@1-3 -h [node1] myfile1 [node2] [node3] myfile1 myfile2 gropp@mcs.anl.gov

Performance of ptcp • Copying a single 10 MB file • to 241 nodes in 14 seconds Time to Copy 10MB file Total Bandwidth gropp@mcs.anl.gov

Watching ptcp ptcp –all bigfile BIGFILE X=1 while true; do \ ptexec -all 'echo "`hostname`: `ls -s BIGFILE \ | awk \ "{print \\"percentage\\" \$ (1)/98 \\" blue \ red\\"}\"`"' | ptdisp -h gropp@mcs.anl.gov

Percentage of Completion gropp@mcs.anl.gov

Availability • Open source • Get from http://www.mcs.anl.gov/sut • All source, man pages • Configure, make, on Linux, Solaris, Irix, AIX • Needs MPI implementation with mpirun • Developed with Linux, MPICH, MPD, on Chiba City at Argonne gropp@mcs.anl.gov

Chiba City Scalability Testbed • http://www-unix.mcs.anl.gov/chiba/ gropp@mcs.anl.gov

Some Other Efforts in Scalable Clusters • Large Programs • DOE Scientific Discovery through Advanced Computing (SciDAC) • NSF Distributed Terascale Facility (DTF) • OSCAR • Goal is a “cluster in a box” CD • PVFS (Parallel Virtual File System) • Many Smaller Efforts • www.beowulf.org, etc. • Commercial Efforts • Scyld, etc. gropp@mcs.anl.gov

The IEEE CS Task Force on Cluster Computing (TFCC)

The IEEE CS Task Force on Cluster Computing (TFCC)

Presentation Transcript

Cluster Computing

Position Paper on IEEE Task Force on Insulator Icing Test Methods

Position Paper on IEEE Task Force on Insulator Icing Test Methods

Cluster Computing

7 th IEEE International Symposium on Cluster Computing and the Grid

Cluster Computing on the Fly:

Cluster Computing

Pediatric Sports Injuries of the Wrist and Hand

Update on work of Joint DCMI/IEEE LTSC Task Force

Task Force on POPs

CLUSTER COMPUTING

Guest lecture on cluster computing

Cloud Computing Task Force

Guest lecture on cluster computing

Cluster Computing

CLUSTER COMPUTING

Update on work of Joint DCMI/IEEE LTSC Task Force

Cluster Computing

Task Force on Desensitization

Cluster Computing

A SEMINAR ON CLUSTER COMPUTING

TFCC / TCDP / TCPP / TCSA and Proposal for a new TC on Scalable Computing (TCSC)