460 likes | 572 Views
Programming the CoW!. Tools to start with on the new cluster. What’s it good for?. Net DOOM? It should be good for computation, and to a lesser extent visualization. It’s a shame about Ray. So the CoW is ~4x faster right?. Fortunately for SGI, no!.
E N D
Programming the CoW! Tools to start with on the new cluster.
What’s it good for? Net DOOM? It should be good for computation, and to a lesser extent visualization.
So it should be great for high granularity computations.That is, design your programs to have long processing cycles and infrequent inter-node communication needs, and you should be just fine.
How do we program it? MEMORY Shared Memory – A global memory space is available to all nodes.Nodes use synchronization primitives to avoid contention. NODE NODE NODE NODE NODE MEM MEM MEM MEM MEM NODE NODE NODE NODE NODE Message Passing – Every node has only private memory space. All communications between nodes have to be explicitly directed.
Thread Matrix Multiply Workers split L, and each multiplies with all of R to get a part of RES. L R RES x =
On the cluster we have no hardware support for SM, so MP is the natural alternative. Unix supports sockets for MP. People have built higher level MP libraries out of sockets that make life easier. Two that I am familiar with are PVM and MPI.
PVM: Parallel Virtual Machine. Started in 1989. http://www.csm.ornl.gov/pvm A PVM is a virtual machine made of a collection of independent nodes. It has a lot of support for heterogeneous clusters. It’s easy to use, and maybe lower performing than MPI.
PVM Each node runs one pvmd daemon. Each node can run one or more tasks. Tasks use the pvmd to communicate with other tasks. Task can start new tasks, stop tasks, or delete nodes from the PVM at will. Tasks can be grouped. PVM comes with a console program that lets you control the PVM easily.
PVM: Setup #Where PVM is installed. setenv PVM_ROOT /home/demarle/sci/distrib/mps/pvm3 #What type of machine this node is. setenv PVM_ARCH LINUX #Where the ssh command is. setenv PVM_RSH /local/bin/ssh #Where your PVM applications are. setenv PVMBIN $PVM_ROOT/bin/LINUX #Where the pvm executables are. setenv PATH ${PATH}:$PVM_ROOT/lib setenv PATH ${PATH}:$PVM_ROOT/bin/LINUX
PVM CONSOLE:[demarle@labnix13 scisem]$ pvmpvm> add labnix14add labnix141 successful HOST DTID labnix14 80000pvm> confconf2 hosts, 1 data format HOST DTID ARCH SPEED DSIG labnix14 40000 LINUX 1000 0x00408841 labnix13 80000 LINUX 1000 0x00408841pvm> quitquitConsole: exit handler calledpvmd still running.[demarle@labnix13 scisem]$
PVM CONSOLE, continued: [demarle@labnix13 scisem]$ cord_racerSuspended[demarle@labnix13 scisem]$ pvmpvmd already running.pvm> psps HOST TID FLAG 0x COMMAND labnix13 40016 4/c - labnix13 40017 6/c,f adsmd use "pvm> help" to get a list of commands.use "pvm> kill" to kill tasks use "pvm> delete" to delete nodes from the PVM.use "pvm> halt" to stop every pvm task and daemon.
Message Passing Matrix Multiply Workers split L and R. They always multiply their L’, and take turns broadcasting their R’. L R RES x =
MPI: Message Passing Interface. Started in 1992. http:// www-unix.mcs.anl.gov/mpi/index.html Goal - to standardize message passing so that parallel code can be portable. Unlike PVM it does not specify the virtual machine environment. For instance, it does say how to start a program. It has more basic operations than PVM. It's supposed to be lower level and faster.
MPICH A free implementation of the MPI standard. http://www-unix.mcs.anl.gov/mpi/mpich + it comes with some extras, like scripts that give you some of PVM’s niceties. • mpirun - a script to start your programs with. • mpicc, mpiCC, mpif77, and mpif90. • MPE – a set of performance analysis and program visualization tools.
MPI: Setup #where MPI is installed. setenv MYMPI /home/demarle/sci/distrib/mps/mpi/mpich-1.2.3 #Where the ssh command is. setenv RSHCOMMAND /local/bin/ssh #where the executables are. setenv PATH ${PATH}:${MYMPI}/bin Uses a file to specify which machines you can use. ${MYMPI}/util/machines/machines.LINUX To start an executable: mpirun <-dbg-gdb> -np # filename
If you don't want the overhead of the PVM and MPI libraries and daemons, you can do essentially the same thing with sockets. Sockets will be faster, but also harder to use. They don’t come with groups, barriers, reductions, etc. You have to create these yourself.
SOCKETS Think of file desriptors: sock = socket() ~ fd = fopen int sock = socket(Domain, Type, Protocol); Domain AF_INET over the net AF_UNIX local to a node Type SOCK_STREAM 2ended connections, reliable, no limit. ie TCP SOCK_DGRAM connectionless, unreliable, ~1500 bytes ie UDP Protocol - like a flavor of the domain, these two just take 0
Basic Process for a Master Task //open a socket, like a file descriptor sock=socket(AF_INET, SOCK_STREAM, 0); //bind your end to this machine's IP address and this programs PORT int ret = bind (sock, (struct sockaddr *) &servAddr, sizeof(servAddr)); //let the socket listen for connections from remote machines ret = listen(sock, BACKLOG); //start remote programs system("ssh labnix14 worker.exe"); TO BE CONTINUED …
Basic Process for a Worker //put yourself in background and nohup, to let the master continue ret = daemon(1,0); //open a socket int sock = socket(AF_INET,SOCK_STREAM,0); //bind your end with this machine's IP address and this program’s PORT ret = bind(sock, (struct sockaddr *) &cliAddr, sizeof(cliAddr)); //connect this socket to the listening one in the master ret = connect(sock, (struct sockaddr *) &servAddr, sizeof(servAddr)); TO BE CONTINUED…
Basic Process for a Master Task, cont. //accept each worker’s connection to finish a new two ended socket. children[c].sock = accept(sock, (struct sockaddr *)&children[c].cliAddr, &children[c].cliAddrLen ); //send and receive over the socket as you like ret = send(children[c].sock, parms, 8*sizeof(double), 0); ret = recv(children[c].sock, RES+rr*rsc, rpr*rpc, MSG_WAITALL); //close the sockets when you are done with them close(children[c].sock);
Basic Process for a Worker, cont. //send and receive data as you please ret = recv(sock, parms, 7*sizeof(int), 0); ret = send(sock, (void *)RET, len2, 0); //close the socket when you are done with it close(sock);
Shared Memory on cluster? SM code was so much simpler. So a lot of people have built DSM Systems. • Adsmith, CRL, CVM, DIPC, DSM-PM2, PVMSYNC, Quarks, SENSE, TreadMarks to name a few…
PAGE Based DSMs Use of the Virtual Memory Manager. • Install a signal handler to catch segfaults. • Use mprotect to protect virtual memory pages assigned to remote nodes. • On a segfault - the process blocks - the segfault handler gets a page from a remote node – returns to the process. It suffers when two or more nodes want to write to different and unrelated places on the same memory page.
Object Based DSMs • Let the programmer define the unit of sharing and then provide each shared object with something like load, modify and save methods. • They can eliminate false sharing, but they often aren’t as easy to use.
DIPC • Distributed Inter Process Communication • Page Based. • It’s an extension to the Linux Kernel Specifically it extends SYSTEM V IPC
SYSTEM V IPC? • Like an alternative to threads, it lets arbitrary unrelated processes work together. • Threads share the program's entire global space. • For shmem, processes explicitly declare what is shared. • SYSTEM V IPC also means messages and semaphores.
Basic idea //create an object to share volatile struct shared { int i; } *shared; //make the object shareable shmid = shmget(IPC_PRIVATE, sizeof(struct shared), (IPC_CREAT | 0600)); shared = ((volatile struct shared *) shmat(shmid, 0, 0)); shmctl(shmid, IPC_RMID, 0); //start children, now they don't have copies of “shared”, they all actually access the original one. fork() //all children can access the shared whenever they want shared->i = 0;
How would this change for DIPC? #define IPC_DIPC 00010000 shmid = shmget(IPC_PRIVATE, sizeof(struct shared), (IPC_CREAT | IPC_DIPC | 0600) ); //Same thing applies for semget and msgget.
DIPC works by adding a small modification to the Linux kernel. The kernel looks for IPC_DIPC structures, and bumps them out to a user level daemon. Structures without the flag are treated normally. The daemon satisfies the request over the network, and then returns the data to the kernel. Which in turn returns the data to the user process.
The great thing about DIPC is that it is very compatible with normal Linux. A DIPC program will run just fine on an isolated machine without DIPC, the flag will just be ignored. This means you can develop your software off the cluster and then just throw it on to make use of all the CPU's.
DIPC Problems? Does strict sequential consistency, which is very easy to use but wastes a lot of network traffic. The version for the 2.4.X kernel isn't finished yet.
Summary • CPU , COMMUNICATIONS • MP: PVM, MPI, SOCKETS • DSM: DIPC?, Quarks?, …
REFERENCES PVM http://www.csm.ornl.gov/pvm MPI http://www-unix.mcs.anl.gov/mpi/index.html MPICH http://www-unix.mcs.anl.gov/mpi/mpich DIPC http://wallybox.cei.net/dipc