200 likes | 555 Views
Computing at Norwegian Meteorological Institute. Roar Skålin Director of Information Technology Norwegian Meteorological Institute roar.skalin@met.no CAS 2003 – Annecy 10.09.2003. Norwegian Meteorological Institute. Main office in Oslo Regional offices in Bergen and Tromsø
E N D
Computing atNorwegian Meteorological Institute Roar Skålin Director of Information Technology Norwegian Meteorological Institute roar.skalin@met.no CAS 2003 – Annecy 10.09.2003
Norwegian Meteorological Institute • Main office in Oslo • Regional offices in Bergen and Tromsø • Aviation offices at four military airports and Spitsbergen • Three arctic stations: Jan Mayen, Bear Island and Hopen • 430 employees (1.1.2004)
met.no Computing Infrastructure NTNU - Trondheim met.no - Oslo Production Dell 4/8 Linux Production Dell 2/4 Linux SGI O3800 512/512/7.2 CXFS SGI O3800 384/304/7.2 Switch Backup Server SGI O200/2 Climate Storage 2/8/20 NetApp 790 GB STA 5 TB Scali Cluster 20/5/0.3 DLT 20 TB S-AIT 33 TB Switch Storage Server SGI O2000/2 XX Cluster y/y/y 1 GBit/s 2.5 GBit/s DLT 20 TB Router Router Router 100 MBit/s 155 MBit/s x/y/z = processors/GB memory/TB disk
met.no Local Production Servers Production Environment November 2003: Dell PowerEgde servers with two and four CPUs NetApp NAS Linux ECMWF Supervisor Monitor Scheduler (SMS) Perl, shell, Fortran, C++, XML, MySQL, PostgreSQL Cfengine
Linux replace proprietary UNIX at met.no Advantages: • Off-the-shelf hardware replace proprietary hardware • Reduced cost of new servers • Reduced operational costs • Overall increased stability • Easier to fix OS problems • Changing hardware vendor becomes feasible • Become an attractive IT-employer with highly motivated employees Disadvantages: • Cost of porting software • High degree of freedom: a Linux distribution is as many systems as there are users
Data storage: A critical resource • We may loose N-1 production servers and still be up-and-running, but data must be available everywhere all the time • We used to duplicate data files, but increased use of databases reduce the value of this strategy • met.no replace a SAN by a NetApp NAS because: • availability • Linux support • ”sufficient” IO-bandwidth (40-50 MB/s per server)
Performance available to met.no CRAY X-MP CRAY Y-MP CRAY T3E SGI O3000
met.no Production Compute Servers SGI Origin 3800 Embla: • 512 MIPS R14K processors • 614 Gflops peak • 512 GB memory Gridur: • 384 MIPS R14K processors • 384 Gflops peak • 304 GB memory Trix OS / LSF batch system 7.2 TB CXFS filesystem
Production timeline HIRLAM20 HIRLAM10 HIRLAM5 UM MM5 ECOM3D/WAVE ECOM3D MIPOM22
HIRLAM scales, or …? • The forecast model without I/O and support programs scales reasonably well up to 512 processors on a SGI O3800 • In real life: • data transfer, support programs and I/O has a very limited scaling • there are other users of the system • machine dependent modifications to increase scaling has a high maintenance cost for a shared code such as HIRLAM • For cost-efficient operational use, 256 processors seems to be a limit
How to utilise 898 processors operationally? • Split in two systems of 512 and 384 processors and used as primary and backup system • Will test a system to overlap model runs based on dependencies: HIRLAM 10 HIRLAM 5 HIRLAM 20 ECOM3D WAVE
Overlapping Production Timeline HIRLAM20 HIRLAM10 HIRLAM5 UM MM5 ECOM3D/WAVE ECOM3D MIPOM22
RegClim: Regional Climate Development Under Global Warming Overall aim: • Produce scenarios for regional climate change suitable for impact assessment • Quantify uncertainties Some keywords: • Involve personell from met.no, universities and research organisations • Based on global climate scenarios • Dynamical and empircal downscaling • Regional and global coupled models • Atmosphere – ocean – sea-ice
Climate Computing Infrastructure NTNU - Trondheim Para//ab - Bergen SGI O3000 512/512/7.2 CXFS SGI O3000 384/304/7.2 IBM Cluster 64/64/0.58 IBM p690 Regatta 96/320/7 Climate Storage 2/8/20 IBM 3584 12 TB S-AIT 33 TB 2.5 GBit/s Router Router 155 MBit/s x/y/z = processors/GB memory/TB disk
Climate Storage Server Low-cost solution: • Linux server • Brocade switch • Nexsan AtaBoy/AtaBeast RAID 19.7 TB • 34 TB Super-AIT library, tapecapacity 0.5 TB uncompressed
GRID in Norway • Testgrid comprising experimental computers at the four universities • Globus 2.4 -> 3.0 • Two experimental portals: Bioinformatics and Gaussian • Plan testing of large datasets (Storage Resource Broker) and metascheduling autumn 2003 • Plan to phase in production supercomputers in 2004