180 likes | 189 Views
Detailed report on hardware, network upgrades, and performance tests at TRIUMF site in Edinburgh for HEPiX conference 2004 by Corrie Kost. Includes server upgrades, high-speed I/O tests, and network enhancements.
E N D
TRIUMF SITE REPORTCorrie Kost TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
LINUX at TRIUMF TRIUMF urges proper support for Scientific Linux TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WAN Replacement MRV units (10Gb/sec capable) Third Passport Router TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WestGrid – UBC/TRIUMF Site • 504 dual 3.06 GHz Xeon IBM blades • Red Hat Linux 9 to allow GPFS (NFS nixed) • OPENPBS Scheduling with (MOAB) Maui • 10 TB disk storage • 70 TB tape storage • Direct Gigabit connection between sites • Possible 10GB in future • February 2004 – opened for general use. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
WestGrid – UBC/TRIUMF Site(www.westgrid.ca) • From a cold start : • GPFS servers load in 5-10min • All nodes up on 60-90min • Bring up single nodes – 10min • Rebuild (disk) for node – 2 hrs • Single node failure rate ~ 1/day • Node disk failures dominate • Utilization about 87% TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Network / Servers TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Servers Upgrade Program TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
LCG Grid Participant TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High I/O Testbed • Hardware nice but… • 40pin IDE cable is • a problem with 2.6 kernel • Mounting bracket • screws can short audio & • halt boot TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
STORM1 & STORM2 • Dual 3.2 GHz Xeons • 4GB memory • 4 3WARE 8506-4LP • 16 SATA150 120GB DRIVES • 20GB ST92011A DRIVE • INTEL 10GBE PXLA8590LR TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O –Part 1 Used ext2 for highest speeds (no journaling, but 2TB file size limit) RH 9 OneFour disk (writes) software RAID 0 3-Ware Controller 50.6 , 98, 124, 141 MB/sec respectively. Four disks split over two 3-Ware controllers 162 MB/sec writes Four disks on 1 hardware raid 0 and software raid 0 138MB/sec writes Adding 4 more disks on second 3-Ware – 250 MB/sec (slots 2,5) --247 MB/sec (slots 2,3) Adding 4 more disks on third 3-Ware -- 273 MB/sec (slots 2,3,5) -- 265 MB/sec (slots 2,3,4) Adding 4 more disks on fourth 3-Ware -- 283 MB/sec (slots 2,3,4,5) TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O- Part 2 • Using 4 3-ware in hardware raid 0 mode , software raided by Linux • dd if=/dev/zero of=/raid/8GB bs=81920 count=104857 • Fedora1 – non-smp – 2.4.22-1.2188np1 HT ext2 -T news write 370 MB/sec • Fedora1 – non-smp – 2.4.22-1.2188.np1 HT reiserfs write 227 MB/sec • Loaded e2fs module 1.35-7.1 to fix -largefile and –largefile4 creation with mkfs –T largefile /dev/md0 • Fedora1 –non-smp – 2.4.22-1.2188npt1 HT largefile ext2 write 349 MB/sec • Fedora1 –non-smp -2.4.22-1.2188npt1 noHT largefile ext2 write 300 MB/sec • Fedora1 –non-smp – 2.6.6#1 HT largefile ext2 write 375 MB/sec • Replaced 40 with 80 pin ide cable to main disk allowed SMP to boot • Fedora1 –SMP – 2.6.6#1 noHT largefile ext2 write 309 MB/sec • echo 262144 > /proc/sys/net/core/rmem_default • echo 8388608 > /proc/sys/net/core/rmem_max • echo 262144 > /proc/sys/net/core/wmem_default • echo 8388608 > /proc/sys/net/core/wmem_max • echo 300000 > /proc/sys/net/core/netdev_max_backlog • echo 8388608 > /proc/sys/net/core/optmem_max • sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" • sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" • sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000" • Iperf maxed out at 2.3Gbits/sec with recompiled 2.6.6 kernel for WEB100 TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
High Speed I/O- Part 3 [root@storm2 root]# time ttcp -t -b 6000000 -l 102400 storm1-10g </raid/8gb-a ttcp-t: buflen=102400, nbuf=2048, align=16384/0, port=5001, sockbufsize=6000000 tcp -> storm1-10g ttcp-t: socket ttcp-t: sndbuf ttcp-t: connect ttcp-t: 8589934592 bytes in 42.80 real seconds = 195978.14 KB/sec +++ ttcp-t: 83887 I/O calls, msec/call = 0.52, calls/sec = 1959.80 ttcp-t: 0.0user 22.2sys 0:42real 52% 0i+0d 0maxrss 0+25pf 17854+622csw Ttcp disk to disk 191 Mbytes/sec Three Walls : CPU - 100 % seen 3Ware I/O Controller (140MB/sec instead of 4*50, 375MB/sec instead of 4*140) 10Gbit Intel Card using ixgb-1.0.65 driver (2.3 Gb/sec) Ongoing: Tuning Process Affinity (using /usr/bin/run) Interrupt Affinity (IRQ of 3-ware and 10GbE set to CPU’s eg /proc/irq/24/smp_affinity) TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Misc. Developments • Build a cheap hot-swap • Serial ATA drives • Raid 5 system • 1 Promise Fasttrack S150 SX4 controller $233Can • 3 Promise Superswap 1100 Drive Enclosures for SATA/150 $112Can • 3 Maxtor 120GB S-ATA drives (6Y120M0) $145Can • Test on cheap 1.7GHz Celeron, Intel D845GVSLR, 256Mb memory • Redhat 9.0 base (won’t work on updated kernels) • Read large file – 46.8 Mbytes/sec • Write large file – 46.5 Mbytes/sec • Able to pull disk while active – auto rebuilds in 75min when replaced. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Misc. Developments • Remote power on/off • using networked power bars www.servertech.com TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Mail at TRIUMF TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
IMP Webmail TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost
Squirrel Webmail TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost