230 likes | 377 Views
2. Topics. Programming environmentCompilationExecutionDebuggingProgramming modelProcessorsMemoryFilesCommunicationsWhat happens under the covers. 3. Programming on BG/L. A single application program imageRunning on tens of thousands of compute nodesCommunicating via message passingEach
E N D
1. BlueGene/L System Software Derek Lieber
IBM T. J. Watson Research Center
February 2004
2. 2 Topics
Programming environment
Compilation
Execution
Debugging
Programming model
Processors
Memory
Files
Communications
What happens under the covers
3. 3 Programming on BG/L A single application program image
Running on tens of thousands of compute nodes
Communicating via message passing
Each image has its own copy of
Memory
File descriptors
4. 4 Programming on BG/L A “job” is encapsulated in a single host-side process
A merge point for compute node stdout streams
A control point for
Signaling (ctl-c, kill, etc)
Debugging (attach, detach)
Termination (exit status collection and summary)
5. 5 Programming on BG/L Cross compile the source code
Place executable onto BG/L machine’s shared filesystem
Run it
“blrun <job information> <program name> <args>”
Stdout of all program instances appears as stdout of blrun
Files go to user-specified directory on shared filesystem
blrun terminates when all program instances terminate
Killing blrun kills all program instances
6. 6 Compiling and Running on BG/L
7. 7 Programming Models “Coprocessor model”
64k instances of a single application program
each has 255M address space
each with two threads (main, coprocessor)
non-coherent shared memory
“Virtual node model”
128k instances
127M address space
one thread (main)
8. 8 Programming Model Does a job behave like
A group of processes?
Or a group of threads?
A little bit of each
9. 9 A process group? Yes
Each program instance has its own
Memory
File descriptors
No
Can’t communicate via mmap, shmat
Can’t communicate via pipes or sockets
Can’t communicate via signals (kill)
10. 10 A thread group? Yes
Job terminates when
All program instances terminate via exit(0)
Any program instance terminates
Voluntarily, via exit(!0)
Involuntarily, via uncaught signal (kill, abort, segv, etc)
No
Each program instance has own set of file descriptors
Each has own private memory space
11. 11 Compilers and libraries
GNU C, Fortran, C++ compilers can be used with BG/L, but they do not exploit 2nd FPU
IBM xlf/xlc compilers have been ported to BG/L, with code generation and optimization features for dual FPU
Standard glibc library
MPI for communications
12. 12 System calls Traditional ANSI + “a little” POSIX
I/O
Open, close, read, write, etc
Time
Gettimeofday, etc
Signal catchers
Synchronous (sigsegv, sigbus, etc)
Asynchronous (timers and hardware events)
13. 13 System calls No “unix stuff”
fork, exec, pipe
mount, umount, setuid, setgid
No system calls needed to access most hardware
Tree and torus fifos
Global OR
Mutexes and barriers
Performance counters
Mantra
Keep the compute nodes simple
Kernel stays out of the way and lets the application program run
14. 14 Software Stack in BG/L Compute Node CNK controls all access to hardware, and enables bypass for application use
User-space libraries and applications can directly access torus and tree through bypass
As a policy, user-space code should not directly touch hardware, but there is no enforcement of that policy
15. 15 What happens under the covers?
The machine
The job allocation, launch, and control system
The machine monitoring and control system
16. 16 The machine Nodes
IO nodes
Compute nodes
Link nodes
Communications networks
Ethernet
Tree
Torus
Global OR
JTAG
17. 17 The IO nodes 1024 nodes
talk to outside world via ethernet
talk to inside world via tree network
not connected to torus
embedded linux kernel
purpose is to run
network filesystem
job control daemons
18. 18 The compute nodes 64k nodes, each with 2 cpus and 4 fpus
application programs execute here
custom kernel
non-preemptive
application program has full control of all timing issues
kernel and application share same address space
kernel is memory protected
kernel provides
program load / start / debug / termination
file access
all via message passing to IO nodes
19. 19 The link nodes Signal routing, no computation
Stitch together cards and racks of io and compute nodes into “blocks” suitable for running independent jobs
Isolate each block’s tree, torus, and global OR network
20. 20 Machine configuration
21. 21 Kernel booting and monitoring
22. 22 Job execution
23. 23 Blue Gene/L System Software Architecture
24. 24 Conclusions BG/L system software stack has
Custom solution (CNK) on compute nodes for high performance
Linux solution on I/O nodes for flexibility and functionality
MPI as default programming model
BG/L system software must scale to very large machines
Hierarchical organization for management
Flat organization for programming
Mixed conventional/special-purpose operating systems
Many challenges ahead, particularly in performance, scalability and reliability