270 likes | 457 Views
JASMIN/CEMS and EMERALD. Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012. Outline. STFC Compute and Data National and International Services Summary. Daresbury Laboratory Daresbury Science and Innovation Campus
E N D
JASMIN/CEMS and EMERALD Scientific Computing Developments at STFC Peter Oliver, Martin Bly Scientific Computing Department Oct 2012
Outline • STFC • Compute and Data • National and International Services • Summary JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire UK Astronomy Technology Centre Edinburgh Polaris House Swindon, Wiltshire Rutherford Appleton Laboratory Harwell Oxford Science and Innovation Campus Chilbolton Observatory Stockbridge, Hampshire Joint Astronomy Centre Hawaii Isaac Newton Group of Telescopes La Palma JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
What we do…. • The nuts and bolts that make it work • enable scientists, engineers and researcher to develop world class science, innovation and skills JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
SCARF • Providing Resources for STFC Facilities, Staff and their collaborators • ~2700 Cores • Infiniband • Panasasfilesystem • Managed as one entity • ~50 peer reviewed publications/year • Additional capacity per year for general use • Facilities such as CLF add capacity using their own funds • National Grid Service partner • Local access using Myproxy-SSO • Users use federal id and password to login • UK e-Science Certificate access JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
NSCCS (National Service Computational Chemistry Software) • Providing National and International Compute, Training and support • EPSRC Mid-Range Service • SGI Altix UV SMP system, 512 CPUs, 2TB shared memory • Large memory SMP chosen over a traditional cluster as this best suites the Computational Chemistry Applications • Supports over 100 active users • ~70 peer reviewed papers per year • Over 40 applications installed • Authentication using NGS technologies • Portal to submit jobs • access for less computationally aware chemists JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Tier-1 Architecture OPN • >8000 processor cores • >500 disk servers (10PB) • Tape robot (10PB) • >37 dedicated T10000 tape drives (A/B/C) SJ5 CMS CASTOR ATLAS CASTOR LHCB CASTOR GEN CASTOR CPU Storage Pools JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
E-infrastructure South • Consortium of UK universities • Oxford, Bristol, Southampton, UCL • Formed the Centre for Innovation • With STFC as a partner • Two New Services (£3.7M) • IRIDIS – Southampton – x86-64 • EMERALD – STFC – GPGPU Cluster • Part of larger investment in e-infrastructure • A Midland Centre of Excellence (£1M). Led by Loughborough University • West of Scotland Supercomputing Centre for Academia and Industry (£1.3m). Led by the University of Strathclyde • E-Infrastructure Interconnectivity (£2.58M). Led by the University of Manchester • MidPlus: A Centre of Excellence for Computational Science, Engineering and Mathematics (£1.6 M). Led by the University of Warwick JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD • Providing Resources to Consortium and partners • Consortium of UK universities • Oxford, Bristol, Southampton, UCL, STFC • Largest production GPU facility in UK • 372 NvidiaTelsa M2090 GPUs • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100’s of applications covering all sciences JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD • 6 racks JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD HARDWARE I • 15 x SL6500 chassis: • 4 x GPU compute nodes, each 2 x CPUs and 3 x NVidia M2090 GPUs = 8 GPUs & 12 GPUs per chassis, power ~3.9kW • SL6500 scalable line chassis • 4 x 1200W power supplies, 4 fans• 4 x 2U, half-width SL390 servers • SL390s nodes• 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) • 3 x NVidia M2090 GP-GPUs (512 CUDA cores)• 48GB DDR-3 memory • 1 HDD 146GB SAS 15k drive • HP QDR Infiniband & 10GbE ports • Dual 1Gb network ports JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD HARDWARE II • 12 x SL6500 chassis, • 2 x GPU compute nodes, each 2 x CPUs and 8 x NVidia M2090 GPUs = 4 CPUs & 16 GPUs per chassis, power ~ 4.6kW.Twelve Chassis• SL6500 scalable line chassis • 4 x 1200W power supplies, 4 fans• 2 x 4U, half-width SL390 servers • SL390s nodes• 2 x Intel E5649 (2.53GHz, 6 core, 80 Watts) • 8 x NVidia M2090 GP-GPUs (512 CUDA cores)• 96GB DDR-3 memory • 1 HDD 146GB SAS 15k drive • HP QDR Infiniband & 10GbEthernet• Dual 1Gb network ports JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD • System Applications • RedHat Enterprise 6.x • Platform LSF • CUDA tool kit • SDK and libraries • Intel and Portland Compilers • Scientific Applications • Still under discussion • Computational Chemistry front runners • AMBER • NAMD • GROMACS • LAMMPS • Eventually 100s of applications covering all sciences JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
EMERALD • Managing a GPU cluster • GPUs are more power efficient and give more Gflops/Watt than x86_64 servers • Reality……True……But each 4 U Chassis: • ~1.2 kW/U space • Full rack required 40+ kW! • Hard to cool • Additional in row coolers • Cold aisle containment • Uneven power demand • Stresses aircon and power infrastructure • 240 GPU job • 31kW Cluster idle to 80kW instantly • Measured GPU parallel MPI job (HPL) using 368 GPU Cores ~1.4Gflops/W • Measured X5675 cluster parallel MPI job (HPL) ~0.5Gflops/W JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS • CEDA data storage & services • Curated data archive • Archive management services • Archive access services (HTTP, FTP, Helpdesk, ...) • Data intensive scientific computing • Global / regional datasets & models • High spatial, temporal resolution • Private cloud • Flexible access to high-volume & complex data for climate & earth observation communities • Online workspaces • Services for sharing & collaboration JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS Oct 2011 ... 8-Mar-2012 BIS Funds Tender Order Build Network Complete • Deadline (or funding gone!): 31st March 2012 for “doing science” • Government Procurement : £5M Tender to order < 4 weeks • Machine room upgrades + Large Cluster compete for time • Bare floor to operation in 6 weeks • 6 hours from power off to 4.6PBytes ActiveStore11 mounted at RAL • “Doing science” 14th March • 3 Satellite Site installs in Parallel (Leeds 100TB, Reading 500TB, ISIC 600TB) JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS at RAL • 12 Racks w. Mixed Servers and Storage • 15KW/rack peak (180KW Total) • Enclosed cold aisle + in-aisle cooling • 600kg / rack (7.2 Tonnes total) • Distributed 10Gb network • (1 Terabit/s bandwidth) • - Single 4.5PB global file system • Two VMware vSphere pools of servers with dedicated image storage. • 6 Weeks bare floor to working 4.6PB. JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/ CEMS Infrastructure Configuration: Storage:103 Panasas ActiveStor 11 shelves, (2,208 x 3TB drives total). Computing: ‘Cloud’ of 100’s of Virtual machines hosted on 20 Dell R610 Servers Networking: 10Gb Gnodal throughout. “Lightpath” dedicated links to UK and EU Supercomputers Physical: 12 Racks. Enclosed aisle, in-row chillers Capacity: RAL 4.6 PB useable (6.6PB raw). This is equivalent to 920,000 DVDs (a 1.47 km high tower of DVDs) High Performance: 1.03 Tb/s total storage bandwidth = Copying 1500 DVDs per minute Single Namespace Solution: one single file system, managed as one system Status: The largest Panasas system in the world and one of the largest storage deployments in the UK JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
JASMIN/CEMS Networking • Gnodal 10Gb Networking • 160 x 10Gb Ports • in a 4 x GS4008 switch stack • Compute • 23 Dell servers for VM hosting • (VMware vCentre + vCloud) and HPC access to storage. • 8 Dell Servers for compute • Dell EquallogiciSCSI arrays (VM images) • All 10Gb connected. • Already upgraded 10Gb network • to add 80 more Gnodal 10Gb ports • Compute expansion JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
What is Panasas Storage? Director Blade Storage Blades • “A complete hardware and software storage solution” • Ease of Management • Single Management Console for 4.6PB • Performance • Parallel access via DirecFlow, NFS, CIFS • Fast Parallel reconstruction • ObjectRAID • All files stored as objects. • RAID level per file • Vertical, Horizontal and network parity • Distributed parallel file system • Parts (objects) of files on every blade • All blades transmit/receive in parallel • Global Name Space • Battery UPS • Enough to shut down cleanly. • 1x 10Gb Uplink per shelf • Performance scales with size JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
PanActive Manager JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Panasas in Operation • Performance • Random IO 400MB/s per host • Sequential IO 1Gbyte/s per host • External Performance • 10Gb connected • Sustained 6Gp/s • Reliability • 1133 Blades • 206 Power Supplies • 103 Shelf Network switches • 1442 components • Soak testing revealed 27 faults • In Operation 7 faults • No loss of service • ~0.6% failure per year • Compared to commodity storage ~5% per year JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Infrastructure SolutionsSystems Management • Backups • System and User Data • SVN • Codes and documentation • Monitoring • Ganglia, Cacti, Power-management • Alerting • Nagios • Security • Intrusion detection, patch monitoring • Deployment • Kickstart, LDAP, inventory database • VMware • Server consolidation,extra resilience • 150+ Virtual servers • Supporting all e-Science activities • Development Cloud • ~ JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
e-Infrastructures • Lead role in National and International e-infrastructures • Authentication • Lead and Develop UK e-Science Certificate Authority • Total issued ~30,000 • Current~3000 • Easy integration of UK Access Management Federation • Authorisation • Use existing EGI tools • Accounting • Lead and develop EGIAPEL accounting • 500M Records, 400GB data • ~282 Sites publish records • ~12GB/day loaded into the main tables • Usually 13 months but Summary data since 2003 • Integrated into existing HPC style services JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
e-Infrastructures • Lead role in National and International e-infrastructures • User Management • Lead and develop NGSUAS Service • Common portal for project owners • Manage Project and User Allocations • Display trends, make decisions (policing) • Information, what services are available? • Lead and develop the EGI information portal GOCDB • 2180 registered GOCDB users belonging to 40 registered NGIs • 1073 registered sites hosting a total of 4372 services • 12663 downtime entries entered via GOCDB • Training & Support • Training Market place • tool developed to promote training opportunities, resources and materials • SeIUCCR Summer Schools • Supporting 30 students for 1 week Course (120 Applicants) JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Summary • High Performance Computing and Data • SCARF • NSCCS • JASMIN • EMERALD • GridPP – Tier1 • Managing e-Infrastructures • Authentication, Authorisation, Accounting • Resource discovery • User Management, help and Training JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing
Information • Website • http://www.stfc.ac.uk/SCD • Contact: Pete Oliver • peter.oliver at stfc.ac.uk Questions? JASMIN/CEMS and EMERALD - HEPiX Fall 2012, Beijing