120 likes | 236 Views
CSC Site Update HP Nordic TIG. April 2008 Janne Ignatius Marko Myllynen Dan Still. CSC at a Glance. CSC at glance. Founded in 1970 as a technical support unit for Univac 1108 Reorganized as a company, CSC - Scientific Computing Ltd. in 1993
E N D
CSC Site Update HP Nordic TIG April 2008Janne IgnatiusMarko Myllynen Dan Still
CSC at a Glance CSC at glance Founded in 1970 as a technical support unit for Univac 1108 Reorganized as a company, CSC - Scientific Computing Ltd. in 1993 All shares to the Ministry of Education of Finland in 1997 Operate on a non-profit principle Facilities in Keilaniemi, Espoo, since March, 2005 MISSION CSCis the national IT center for science developing and providing services for universities, research institutes, and industry. VISION CSCis well known and appreciated in Finland as well as abroad as a pioneer, collaboration partner, and center of competence in the field of IT technology for science.
CSC’s Services FUNET SERVICES COMPUTING SERVICES APPLICATION SERVICES DATA SERVICES FOR SCIENCE AND CULTURE INFORMATION MANAGEMENT SERVICES
Louhi - Cray XT4 Supercomputer • 1st phase installed 04/2007 • 1012 computing nodes each having 2.6 GHz AMD Opteron dual core processor • High bandwidth low latency interconnect (SeaStar2) • 1 - 2 GB memory per core • Peak performance 10.6 teraflops • Final configuration (to be installed Q3/2008) core count open, 1-2 GB memory per core • Peak performance 70+ teraflops
Murska - HP CP4000 BL ProLiant Supercluster • Installed 04/2007, expanded 11/2007 • 544 compute nodes each having two 2.6 GHz AMD Opteron dual core processor • 2176 compute cores • 4x DDR InfiniBand interconnect • 5 TB total memory: 256 nodes * 4GB, 128 * 8GB, 128 * 16GB, 32 * 32GB • 100 TB SFS/Lustre file system • Peak performance 11.3 teraflops
Murska - HP CP4000 BL ProLiant, cont. • RHEL 4 based HP XC 3.1 cluster operating system • SLURM/LSF • HP-MPI • PGI, PathScale, GNU, TotalView, ACML, … • HP Xtools, collectl, mpe2, … • Blade hardware working surprisingly well • Interconnect working nicely • Disk system also working ok after initial issues • MSA20 disk array failure recovery suboptimal • SFS quota still limited to 4 TB • System constantly in heavy use
Murska - HP CP4000 BL Availability • Three unexpected breaks after Nov 2007 upgrades • 29.1.2008: SFS hang, fixed with disk array reset • 30.1.2008: Ethernet switch died (in the cabin where several power supplies had died few days earlier..) • 12.3.3008: SFS hang, fixed with disk array reset • System availability since Nov 2007 95%-100% • System usage since Nov 2007 30%-100%
Sepeli - HP ProLiant DL145 Cluster • Installed 2005 • 128 (earlier 256) compute nodes • 512 cores and 2 TB memory • 4x DDR InfiniBand / GigE interconnect • 4 TB PVFS2 / NFS disk system • Peak performance 3.1 teraflops • Earlier part of national M-grid, now being dedicated to LHC use (particle collision data analysis)
Sepeli - HP ProLiant DL145 Cluster, cont. • RHEL 4 based Rocks 3.1 cluster operating system • SGE • Overall system lifespan price/performance quite satisfactory • InfiniBand hardware very stable • Grid Engine tight integration with multiple MPI flavors labor-intensive • DL145 iLO initially unreliable, improved over time
Material Sciences National Grid Infrastructure (M-grid) • A joint project of CSC, 7 Finnish universities and Helsinki Institute of Physics funded by the Finnish Academy for the National Research Infrastructure Program in the Grid area • Aims to build a homogeneous PC-cluster environment with theoretical peak of approx. 3 teraflops per 350 nodes • Environment • Hardware: Provided by HP. Dual AMD Opteron 1.8-2.2 GHz nodes with 2-8 GB memory, 1-2 TB shared storage, separate 2xGE (communications and NFS), remote administration • OS: NPACI Rocks Cluster Distribution / 64 bit, based on Red Hat Enterprise Linux 3, 4 • Grid middleware: NorduGrid ARC Grid MW compiled • With Globus 3.2.1 libraries, Sun Grid Engine as LRMS • Centrally managed configuration with Cfengine • CSC • Administration tasks • Maintains Operating • System, LRMS, Grid middleware, certain libraries • Separate small test cluster for testing new software releases, • Tools for system monitoring, integrity checking, etc.
Some international activities • PRACE • DEISA • EGEE, EGI, NDGF; HPC-EUROPA, …