240 likes | 367 Views
Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006. IBM Switch Evolution. IBM Switch Evolution. HPS Switch Configuration. Bassi Switch Configuration. IBM Software.
E N D
Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006
IBM Software • Parallel Environment (PE 4.2.2) which contains poe and MPI remains unchanged • Parallel System Support Package (PSSP 3.5.0), which contains LAPI, absorbed in Reliable Scalable Clustering Technology (RSCT 2.4.2) software stack.
IBM Software • MPI 4.2.2 • Uses LAPI as reliable transport layer • Uses threads not signals for asynchronous activities • Binary compatible • New performance characteristics • Eager • Bulk transfer • Collectives
IBM Software Stack Application ESSL PESSL GPFS Sockets VSD TCP UDP MPI LAPI IP HAL IF_LS SMA3+ Adapter HPS
Communication Modes • FIFO mode • Chopped into 2KB chunks on host, copied by CPU • Remote Direct Memory Access (RDMA) • CPU offload • One I/O bus crossing CPU User Buffer RDMA FIFO DMA Adapter
RDMA (Bulk transfer) • Overlap of communication and computation possible • Asynchronous-messaging applications • One-sided communications • Reduce CPU work • Offload fragmentation and reassembly • Minimize packet arrival interrupts • Reduce memory subsystem load • Zero copy transport • Striping across adapters
data req data ack ack ack MPI Transfer Protocols P0 P1 • Eager: send data immediately; store in remote buffer • No synchronization • Only one message sent • Uses memory for buffering (less for application) • Rendezvous: send message header; wait for recv to be posted; send data • No data copy may be required • No memory required for buffering (more for application) • More messages required • Synchronization (standard send blocks until recv posted)
POE environment variables • MP_SINGLE_THREAD • Set to Yes for slight latency decrease, set to No for MPI I/O and OpenMP, etc. • MP_USE_BULK_XFER • Default to Yes • MP_BULK_MIN_MSG_SIZE • Default to ~150KB 21
POE environment variables • MP_BUFFER_MEM • Default is 64MB • MP_EAGER_LIMIT • Varies from 32KB to 1KB depending on job size, can be increased in conjunction with MP_BUFFER_MEM • LAPI parameters for apps with many blocking send of small mgs: • MP_REXMIT_BUF_SIZE • Default 128 bytes • MP_REXMIT_BUF_CNT • Default is 128 buffers 22
IBM Documentation • RSCT for AIX 5L LAPI Programming Guide (SA22-7936-03) • LAPI programming • Parallel Environment for AIX 5L V4.2.2Operation and Use, Vol 1 (SA22-7948-04) • Running jobs • Parallel Environment for AIX 5L V4.2.2Operation and Use, Vol 2 (SA22-7949-04) • Performance tools • Parallel Environment for AIX 5L V4.2.2MPI Programming Guide (SA22-7945-04) • IBM MPI implementation