220 likes | 406 Views
Prague Site Report. Jiří Chudoba Institute of Physics, Prague. 23.4.2012 Hepix meeting, Prague. Local Organization. Institute of Physics: 2 locations in Prague, 1 in Olomouc 786 employees (281 researchers + 78 doctoral students) Department of Networking and Computing Techniques (SAVT)
E N D
Prague Site Report Jiří Chudoba Institute of Physics, Prague 23.4.2012 Hepixmeeting, Prague
Local Organization • Institute of Physics: • 2 locations in Prague, 1 in Olomouc • 786 employees (281 researchers + 78 doctoral students) • Department of Networking and Computing Techniques (SAVT) • networking up to offices, mail and web servers, central services • Computing centre (CC) • large scale calculations • part of SAVT (except leader – Jiri Chudoba) • Division of Elementary Particle Physics • Section Department of detector development and data processing • head Milos Lokajicek • started large scale calculations, later transferred to CC • the biggest hw contributor (LHC computing) • participates in the CC operation Jiri.Chudoba@fzu.cz
Server room I • Server room I (Na Slovance) • 62 m2, ~20 racks 350 kVA motor generator, 200 + 2 x 100 kVA UPS, 108 kW air cooling, 176 kW water cooling • continuous changes • hosts computing servers and central services Jiri.Chudoba@fzu.cz
Other server rooms • New server room for SAVT • located next to server room I • independent UPS (24 kW now, max 64 kW n+1), motor generator (96 kW), cooling 25 kW (n+1) • dedicated for central services • 16 m2, now 4 racks (room for 6) • very high reliability required • first servers moved in last week • Server room Cukrovarnicka • another building in Prague • 14 m2, 3 racks (max 5), 20 kW central UPS, 2x8 kW cooling • backup servers and services • Server room UTIA • 3 racks, 7 kW cooling, 3 + 5x1.5 kW UPS • dedicated to Department of Condensed Matter Theory Jiri.Chudoba@fzu.cz
Clusters in CC - Dorje • Dorje: Altix ICE8200, 1.5 rack • 512 cores on 64 diskless WN, IB, 2 disk arrays (6+14 TB) • only local users, solid state physics, condense matter theory • 1 admin for administration and user support • relatively small number of jobs, MPI jobs up to 256 processes • Torque + Maui, SLES10 SP2, SGI Tempo, MKL, OpenMPI, ifort • users run mostly: Wien2k, vasp, fireball, apls Jiri.Chudoba@fzu.cz
Cluster LUNA • 2 servers SunFire X4600 • 8 CPUs 32 cores, 256 GB RAM • 4 servers SunFire V20z, V40z • Operated by CESNET Metacentrum – distributed computing activity of the NGI_CZ • Metacentrum • 9 locations • 3500 cores • 300 TB Jiri.Chudoba@fzu.cz
Cluster Thsun, Small group servers • Thsun • “private” cluster • small number of users • power users with root privileges • 12 servers of variable hw • servers for groups • managed by groups in collaboration with CC Jiri.Chudoba@fzu.cz
Cluster Golias • Upgraded every year – several subclusters of the identical hw • 3812 cores, 30700 HS06 • almost 2 PB disk space • the newest (March 2012) subclusterrubus: • 23 nodes SGI Rackable C1001-G13 • 2x (Opteron 6274 16 cores)64 GB RAM, 2x SAS 300 GB • 374 W (full load) • 232 HS06 per node, 5343 HS06 total Jiri.Chudoba@fzu.cz
Golias shares Planned vs real usage (walltime) Subclusters contribution to the total performance Jiri.Chudoba@fzu.cz
WLCG Tier2 • cluster Golias@FZU + xrootd servers @Rez • 2012 pledges: • ATLAS 10000 HS06, 1030 TiB; 11861 HS06 available, 1300 TB av. • ALICE 5000 HS06, 420 TiB; 7564 HS06, 540 TB available • delivery of almost 600 TB delayed due to floods • 66% efficiency is assumed for WLCG accounting • sometimes under 100% of pledges • Low cputime/walltime ratio for the ALICE • not only on our site • Tests with limits on number of concurrent jobs (last week) • “no limit” (about 900 jobs) – 45% • limit 600 jobs - 54 % Jiri.Chudoba@fzu.cz
Utilization • Very high average utilization • several different projects, different tools for production • D0 – production submitted locally by 1 user • ATLAS – panda, ganga, local users; DPM • ALICE – VO box; xrootd D0 ALICE ATLAS Jiri.Chudoba@fzu.cz
Networking • CESNET upgraded our main CISCO router • 6506 -> 6509 • supervisorSUP720 -> SUP2T • new 8x 10G X2 card • planned upgrade of power supplies 2x3kW -> 2x6 kW • (2 cards 48x1 Gbps, 1 card 4x10 Gbps, FW service module) Jiri.Chudoba@fzu.cz
External connection • Exclusive: 1 Gbps (to FZK) + 10 Gbps (CESNET) • Shared: 10 Gbps (PASNET – GEANT) PASNET link FZK -> FZU FZU -> FZK • Not enough for ATLAS T2D limit (5 MB/s to/from T1s) • Perfsonar installed Jiri.Chudoba@fzu.cz
Miscellaneous items • Torque server performance • W jobs, sometimes long response time • divide Golias in 2 clusters with 2 torque instances? • memory limits for ATLAS and ALICE queues • CVMFS • used by ATLAS, works well • some older nodes have too small disks -> excluded for ATLAS • Management • Cfengine v2 used for production • Puppet used for IPv6 testbed • 2 new 64 core nodes • SGI Rackable H2106-G7, 128 GB RAM, 4x Opteron 6274 2.2GHz, 446 HS06 • frequent crashes when loaded with jobs • Another 2 servers with Intel SB expected • small subclusters with different hw Jiri.Chudoba@fzu.cz
Water cooling • Active vs passive cooling doors • 1 new rack with cooling doors • 2 new cooling doors on APC racks Jiri.Chudoba@fzu.cz
Water cooling good sealing crucial worker nodes diskservers on off (divider added) rubus01 diskservers Jiri.Chudoba@fzu.cz
Distributed Tier2, Tier3s • Networking infrastructure (provided by CESNET) connects all Prague institutions involved • Academy of Sciences of the Czech Republic • Institute of Physics (FZU, Tier-2) • Nuclear Physics Institute • Charles University in Prague • Faculty of Mathematics and Physics • Czech Technical University in Prague • Faculty of Nuclear Sciences and Physical Engineering • Institute of Experimental and Applied physics • Now only NPI hosts resources visible in Grid • Many reasons why others do not: manpower, suitable rooms, lack of IPv4 addresses • Data Storage group at CESNET • deployment for LHC projects discussed Jiri.Chudoba@fzu.cz
Thanks to my colleagues for help with preparation of these slides: • Marek Eliáš • Lukáš Fiala • Jiří Horký • Tomáš Hrubý • Tomáš Kouba • Jan Kundrát • Miloš Lokajíček • Petr Roupec • Jana Uhlířová • Ota Velínský Jiri.Chudoba@fzu.cz