140 likes | 266 Views
Status Report on Tier-1 in Korea. Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC). Korea Institute of Science and Technology Information Global Science experiment Data hub Center. OUTLINE. Computing Resources Operations Network Conclusion. KISTI GSDC Tier-1 Team. ~ 9 people.
E N D
Status Report on Tier-1 in Korea GungwonKang, Sang-Un Ahn andHangjinJang (KISTI GSDC) Korea Institute of Science and Technology Information Global Science experiment Data hub Center
OUTLINE • Computing Resources • Operations • Network • Conclusion 15th CERN-Korea Committee
KISTI GSDC Tier-1 Team ~ 9 people 15th CERN-Korea Committee
Computing Resource Status • 2013 Pledges (CPU): HepSpec06 25,000 • Current HepSpec06: 28,055 • 2,524 Jobs slots available (4 reserved slots for pilot jobs) with H/T enabled • 2013 Pledges (Tape Storage): Tape 1,500 TB • Current Tape capacity: 1,000 TB • Pledges will be met in this year • 2013 Pledges (Disk Storage): Disk 1,000 TB • Current Disk capacity: 966 TB (allocated 1,000 TB but usable space slightly below) 15th CERN-Korea Committee
3.58% (2013) Jobs Total wall clock hours for ALICE jobs in the last 6 months • Current capacity: 2,524 job slots, 28.1 kHS06 • 84 nodes, 32 (logical) cores per node, 11 HS06/core • Maintenance issues • Worker nodes migration to 10GbE equipped ones • Middleware: EMI-3 migration (end of support to EMI-2 by 30 April) • Delivered full pledges for 2013 KISTI, 3.9 % (Including Tier-2) T1 worker nodes migration to 10GbE equipped ones ~ 2500 ~ 1800 ALICE Central Service Maintenance EMI-3 Migration & Delivery of full pledges ~ 800 Apr 2014 Oct 2013
Site Reliability 15th CERN-Korea Committee
15th CERN-Korea Committee KISTI Analysis Facility - KIAF • Parallel Analysis Facility based on PROOF • In operation since 2011, ALICE use only • 1 master, 8 worker nodes, 12 cores and 22 TB disk per node • Similar size and utilization as CAF - CERN Analysis Facility
Plans for On-call Service We are planning to prepare for On-call Service. Maybe it has 3 functions of service. • Alarm system • Nagios + e-mail notifications • Implementing SMS plugin + Night Owl shift by private company • Tape system - hardware/software malfunction reported to IBM and third-party company • 24/7support, intervention to be carried out within one day • Ongoing evaluation of monitoring frameworks: e.g. Icinga, Zabbix, etc. • On-call scheme • One week shift cycle with 5-6 personnel • Expecting 1 or 2 calls in a cycle - alarms from batch scheduler and services, WN servicing • From daily monitoring report – detailed action list on services and hardware incidents • Night owl shift • Private company contract – on-site support • If necessary - SMS and e-mail notification to off-site on-duty experts • Supercomputing division at KISTI is running similar system for years 15th CERN-Korea Committee
Internal Network • Internal network for Tier-1 is isolated from the computing centre service network • Done in Oct 2013 - internal network re-structuring (3-week shutdown) • Preparation for upgrade of bandwidth of external network up to 10Gbps • Main switch upgrade: bandwidth up to 2.5 Tbps • HA configuration of private network • Remove bottlenecks to storage • Full 20 Gbps configuration (Incoming/Outgoing) • Replaced all switches by 10 Gbps; done on part of service racks • 1Gbps switches in place for servers with 1Gbps cards • Worker nodes to be upgraded with10 Gb cards • Tape service nodes are being connected to the 10 Gbps switches
External Network • Current Bandwidth to CERN: 2 Gbps • Dedicated link via Daejeon-Chicago-Amsterdam-Geneva • Roadmap for 10 Gbps upgrade presented to WLCG MB and accepted • Working on upgrading bandwidth up to 10 Gbps
LHC OPN • KISTI T1 network (134.75.125.0/24) included into LHC OPN • BGP Peering between Kreonet router @ KISTI and LCG network @ CERN • perfSONAR has been deployed for measuring bandwidth and latency; firewall policy issue persists concerning the ports below 1024 e.g. 80 (http), 443 (https), 843 (bwctl)
Conclusion • KISTI T1 has been approved as a full T1 at the meeting of WLCG Overview Board in Nov. 2013 • The progress of ramping up the capability as a T1 appreciated by ALICE community and a roadmap to 10G network accepted • In Jan, KISTI T1 joined LHC OPN • Over the last 6 months, KISTI T1 has been in “shape-shifting” in terms of network • Core switches replaced (bandwidth: 0.9 Tbps 2.5 Tbps) • Rack switches replaced (bandwidth: 1 Gbps 10 Gbps) • Servers migrated to 10GbE equipped ones