1 / 18

TRIUMF SITE REPORT Corrie Kost

TRIUMF SITE REPORT Corrie Kost. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost. LINUX at TRIUMF. TRIUMF urges proper support for Scientific Linux. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost. WAN Replacement MRV units (10Gb/sec capable)

zion
Download Presentation

TRIUMF SITE REPORT Corrie Kost

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TRIUMF SITE REPORTCorrie Kost TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  2. LINUX at TRIUMF TRIUMF urges proper support for Scientific Linux TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  3. WAN Replacement MRV units (10Gb/sec capable) Third Passport Router TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  4. WestGrid – UBC/TRIUMF Site • 504 dual 3.06 GHz Xeon IBM blades • Red Hat Linux 9 to allow GPFS (NFS nixed) • OPENPBS Scheduling with (MOAB) Maui • 10 TB disk storage • 70 TB tape storage • Direct Gigabit connection between sites • Possible 10GB in future • February 2004 – opened for general use. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  5. WestGrid – UBC/TRIUMF Site(www.westgrid.ca) • From a cold start : • GPFS servers load in 5-10min • All nodes up on 60-90min • Bring up single nodes – 10min • Rebuild (disk) for node – 2 hrs • Single node failure rate ~ 1/day • Node disk failures dominate • Utilization about 87% TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  6. Network / Servers TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  7. Servers Upgrade Program TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  8. LCG Grid Participant TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  9. High I/O Testbed • Hardware nice but… • 40pin IDE cable is • a problem with 2.6 kernel • Mounting bracket • screws can short audio & • halt boot TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  10. STORM1 & STORM2 • Dual 3.2 GHz Xeons • 4GB memory • 4 3WARE 8506-4LP • 16 SATA150 120GB DRIVES • 20GB ST92011A DRIVE • INTEL 10GBE PXLA8590LR TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  11. High Speed I/O –Part 1 Used ext2 for highest speeds (no journaling, but 2TB file size limit) RH 9 OneFour disk (writes) software RAID 0 3-Ware Controller 50.6 , 98, 124, 141 MB/sec respectively. Four disks split over two 3-Ware controllers 162 MB/sec writes Four disks on 1 hardware raid 0 and software raid 0 138MB/sec writes Adding 4 more disks on second 3-Ware – 250 MB/sec (slots 2,5) --247 MB/sec (slots 2,3) Adding 4 more disks on third 3-Ware -- 273 MB/sec (slots 2,3,5) -- 265 MB/sec (slots 2,3,4) Adding 4 more disks on fourth 3-Ware -- 283 MB/sec (slots 2,3,4,5) TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  12. High Speed I/O- Part 2 • Using 4 3-ware in hardware raid 0 mode , software raided by Linux • dd if=/dev/zero of=/raid/8GB bs=81920 count=104857 • Fedora1 – non-smp – 2.4.22-1.2188np1 HT ext2 -T news write 370 MB/sec • Fedora1 – non-smp – 2.4.22-1.2188.np1 HT reiserfs write 227 MB/sec • Loaded e2fs module 1.35-7.1 to fix -largefile and –largefile4 creation with mkfs –T largefile /dev/md0 • Fedora1 –non-smp – 2.4.22-1.2188npt1 HT largefile ext2 write 349 MB/sec • Fedora1 –non-smp -2.4.22-1.2188npt1 noHT largefile ext2 write 300 MB/sec • Fedora1 –non-smp – 2.6.6#1 HT largefile ext2 write 375 MB/sec • Replaced 40 with 80 pin ide cable to main disk allowed SMP to boot • Fedora1 –SMP – 2.6.6#1 noHT largefile ext2 write 309 MB/sec • echo 262144 > /proc/sys/net/core/rmem_default • echo 8388608 > /proc/sys/net/core/rmem_max • echo 262144 > /proc/sys/net/core/wmem_default • echo 8388608 > /proc/sys/net/core/wmem_max • echo 300000 > /proc/sys/net/core/netdev_max_backlog • echo 8388608 > /proc/sys/net/core/optmem_max • sysctl -w net.ipv4.tcp_rmem="10000000 10000000 10000000" • sysctl -w net.ipv4.tcp_wmem="10000000 10000000 10000000" • sysctl -w net.ipv4.tcp_mem="10000000 10000000 10000000" • Iperf maxed out at 2.3Gbits/sec with recompiled 2.6.6 kernel for WEB100 TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  13. High Speed I/O- Part 3 [root@storm2 root]# time ttcp -t -b 6000000 -l 102400 storm1-10g </raid/8gb-a ttcp-t: buflen=102400, nbuf=2048, align=16384/0, port=5001, sockbufsize=6000000 tcp -> storm1-10g ttcp-t: socket ttcp-t: sndbuf ttcp-t: connect ttcp-t: 8589934592 bytes in 42.80 real seconds = 195978.14 KB/sec +++ ttcp-t: 83887 I/O calls, msec/call = 0.52, calls/sec = 1959.80 ttcp-t: 0.0user 22.2sys 0:42real 52% 0i+0d 0maxrss 0+25pf 17854+622csw Ttcp disk to disk 191 Mbytes/sec Three Walls : CPU - 100 % seen 3Ware I/O Controller (140MB/sec instead of 4*50, 375MB/sec instead of 4*140) 10Gbit Intel Card using ixgb-1.0.65 driver (2.3 Gb/sec) Ongoing: Tuning Process Affinity (using /usr/bin/run) Interrupt Affinity (IRQ of 3-ware and 10GbE set to CPU’s eg /proc/irq/24/smp_affinity) TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  14. Misc. Developments • Build a cheap hot-swap • Serial ATA drives • Raid 5 system • 1 Promise Fasttrack S150 SX4 controller $233Can • 3 Promise Superswap 1100 Drive Enclosures for SATA/150 $112Can • 3 Maxtor 120GB S-ATA drives (6Y120M0) $145Can • Test on cheap 1.7GHz Celeron, Intel D845GVSLR, 256Mb memory • Redhat 9.0 base (won’t work on updated kernels) • Read large file – 46.8 Mbytes/sec • Write large file – 46.5 Mbytes/sec • Able to pull disk while active – auto rebuilds in 75min when replaced. TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  15. Misc. Developments • Remote power on/off • using networked power bars www.servertech.com TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  16. Mail at TRIUMF TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  17. IMP Webmail TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

  18. Squirrel Webmail TRIUMF Site Report for HEPiX, Edinburgh, 24-28 May 2004 – Corrie Kost

More Related