1 / 15

NIKHEF Data Processing Fclty

NIKHEF Data Processing Fclty. Status Overview per 2004.10.27 David Groep, NIKHEF. A historical view. Started in 2000 with a dedicated farm for D Ø 50 Dual P3-800 MHz tower model Dell Precision 220 800 GByte “3ware” disk array. jobs. Many different farms.

varana
Download Presentation

NIKHEF Data Processing Fclty

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NIKHEF Data Processing Fclty Status Overview per 2004.10.27 David Groep, NIKHEF NEROC-TECH NDPF status overview

  2. A historical view • Started in 2000 with a dedicated farm for DØ • 50 Dual P3-800 MHz • tower model Dell Precision 220 • 800 GByte “3ware” disk array jobs NEROC-TECH NDPF status overview

  3. Many different farms • 2001: EU DataGrid WP6 ‘Application’ test bed • 2002: addition of the ‘development’ test bed • 2003: LCG-1 production facility • April 2004: amalgamation of all nodes into LCG-2 • September 2004: addition of • EGEE PPS • VL-E P4 CTB • EGEE JRA1 LTB NEROC-TECH NDPF status overview

  4. Growth of resources Intel Pentium III 800 MHz 100 CPUs 2000 Intel Pentium III 933 MHz 40 CPUs 2001 AMD Athlon MP2000+ ~2 GHz 132 CPUs 2002 Intel XEON 2.8 GHz 54 CPUs 2003 Intel XEON 2.8 GHz 20 CPUs 2003 Total WN resources (raw) 353 THz hr/mo ~200 kSI2k Total on-line disk cache 7 TByte NEROC-TECH NDPF status overview

  5. Node types 2U “pizza” boxesPIII 933 MHz, 1GByte RAM, 43 Gbyte disk 1U GFRC (NCF)AMD MP2000+, 1GByte RAM, 60 Gbyte disk‘thermodynamic challenges’ 1U HalloweenXEON 2.8 GHz2GByte RAM, 80 Gbyte diskfirst GigE nodes NEROC-TECH NDPF status overview

  6. Connecting things together • Collapsed backbone strategy • Foundry Networks BigIron 15000 • 14 GigE SX, 2x GigE LX • 16 1000BaseTX • 48 100BaseTX • Service nodes directly GigE connected • Farms connected via local switches • WN oversubscription typical 1:5 – 1:7 • Dynamic re-assignment of nodes to facilities • DHCP Relay • built-in NAT support (for worker nodes) NEROC-TECH NDPF status overview

  7. NIKHEF Farm Network NEROC-TECH NDPF status overview

  8. Network Uplinks NIKHEF links • 1 Gb/s IPv4 & 1 Gb/s IPv6 SURFnet • 2 Gb/s WTCW (to SARA) SURFnet links: NEROC-TECH NDPF status overview

  9. NDPF Usage • Analyzed production batch logs since May 2002 • total of 1.94 PHzHours provided in 306 000 jobs Added “Halloween” LHC Data Challenges Added NCF GFRC experimental use and tests not shown NEROC-TECH NDPF status overview

  10. Usage per Virtual Organisation Real-time web info: www.nikhef.nl/grid/ www.dutchgrid.nl/Org/Nikhef/farmstats.html • Dzero acts as “background fill” • Usage doesn’t (yet) reflect shares NEROC-TECH NDPF status overview

  11. Usage monitoring • Live viewgraphs • farm occupancy • per-VO distribution • network loads • Tools • Cricket (network) • home-grown scripts + rrdtool NEROC-TECH NDPF status overview

  12. Central services • VO-LDAP services LHC VOs • DutchGrid CA • “edg-testbed-stuff”: • Torque & Maui distribution • installation support components NEROC-TECH NDPF status overview

  13. Some of the issues • Data access patterns in Grids • jobs tend to clutter $CWD • high load when shared over NFS • shared homes required for traditional batch & MPI • Garbage collection for “foreign” jobs • OpenPBS & Torque transient $TMPDIR patch • Policy management • maui fair-share policies • CPU capping • max-queued-jobs capping NEROC-TECH NDPF status overview

  14. Developments: work in progress • Parallel Virtual File Systems • From LCFGng to Quattor (Jeff) • Monitoring and ‘disaster recovery’ (Davide) NEROC-TECH NDPF status overview

  15. Team NEROC-TECH NDPF status overview

More Related