Building a Regional Centre

Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT

Summary • LHC regional computing centre topology • Some capacity and performance parameters • From components to computing fabrics • Remarks about regional centres • Policies & sociology • Conclusions

Why Regional Centres? • Bring computing facilities closer to home • final analysis on a compact cluster in the physics department • Exploit established computing expertise & infrastructure • Reduce dependence on links to CERN • full ESD available nearby - through a fat, fast, reliable network link • Tap funding sources not otherwise available to HEP • Devolve control over resource allocation • national interests? • regional interests? • at the expense of physics interests?

2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps 622 Mbps Uni n Lab a Tier2 Uni b Lab c   Department  Desktop The MONARC RC Topology CERN – Tier 0 University physics department • Final analysis • Dedicated to local users • Limited data capacity – cached only via the network • Zero administration costs (fully automated) Tier 0 – CERN • Data recording, reconstruction, 20% analysis • Full data sets on permanent mass storage – raw, ESD, simulated data • Hefty WAN capability • Range of export-import media • 24 X 7 availability Tier 1 – established data centreor new facility hosted by a lab • Major subset of data – all/most of the ESD, selected raw data • Mass storage, managed data operation • ESD analysis, AOD generation, major analysis capacity • Fat pipe to CERN • High availability • User consultancy – Library & Collaboration Software support Tier 2 – smaller labs, smaller countries, probably hosted by existing data centre • Mainly AOD analysis • Data cached from Tier 1, Tier 0 centres • No mass storage management • Minimal staffing costs MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

2.5 Gbps 622 Mbps 155 mbps 155 mbps 622 Mbps Desktop The MONARC RC Topology CERN – Tier 0 IN2P3 RAL FNAL Tier 1 Uni n Lab a Tier2 Uni b Lab c   Department  MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html

2.5 Gbps 622 Mbps 155 mbps 155 mbps 622 Mbps Desktop More realistically - a Grid Topology CERN – Tier 0 IN2P3 DHL RAL FNAL Tier 1 Uni n Lab a Tier2 Uni b Lab c   Department 

Capacity / Performance all CERN today ~15K SI95 ~25 TB ~100 MB/sec 20% CERN ** 1 SPECint95 = 10 CERNunits = 40 MIPS

Capacity / Performance Approx. Number of farm PCs at CERN today May not find disks as small as that! But we need a high disk count for access, performance, RAID/mirroring, etc. We probably have to buy more disks, larger disks, & use the disks that come with the PCsmuch more disk space Effective throughput of LAN backbone 1.5% of LAN

Building a Regional Centre Commodity components are just fine for HEP • Masses of experience with inexpensive farms • LAN technology is going the right way • Inexpensive high performance PC attachments • Compatible with hefty backbone switches • Good ideas for improving automated operation and management

Evolution of today’s analysis farms Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection

Standard components Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection

HEP’s not special, just more cost conscious Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection

Limit the role of high end equipment Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components LAN backbone WAN connection

Components  building blocks 36 dual 200 SI95 cpus = 14K SI95s ~ $100K 224 3.5” disks 25-100 TB $50K - $200K 2000 – standard office equipment 36 dual cpus ~900 SI95 120 72GB disks ~9 TB 2005 – standard, cost-optimised, Internet warehouse equipment For capacity & cost estimates see the 1999 Pasta Report: http://nicewww.cern.ch/~les/pasta/welcome.html

The Physics Department System • Two 19” racks & $200K • CPU – 14K SI95 (10% of a Tier1 centre) • Disk – 50TB (50% of a Tier1 centre) • Rather comfortable analysis machine  • Small Regional Centres are not going to be competitive • Need to rethink the storage capacity at the Tier1 centres

Tier 1, Tier 2 RCs, CERN A few general remarks: • A major motivation for the RCs is that we are hard pressed to finance the scale of computing needed for LHC • We need to start now to work together towards minimising costs • Standardisation among experiments, regional centres, CERN so that we can use the same tools and practices to … • Automate everything • Operation & monitoring • Disk & data management • Work scheduling • Data export/import (prefer the network to mail) in order to … • Minimise operation, staffing – • Trade off mass storage for disk + network bandwidth • Acquire contingency capacity rather than fighting bottlenecks • Outsource what you can (at a sensible price) • ……. Keep it simple Work together

The middleware The issues are: • integration of this amorphous collection of Regional Centres • Data • Workload • Network performance • application monitoring • quality of data analysis service Leverage the “Grid” developments • Extending Meta-computing to Mass-computing • Emphasis on data management & caching • … and production reliability & quality – Keep it simple Work together

cpu/disk net tape/DVD 200 m2 A 2-experiment Tier 1 Centre Requirement: 240K SI95 220 TB Basic equipment ~ $3m cpus/disks Processors 20 “standard” racks = 1,440 cpus  280K SI95 Disks 12 “standard” racks = 2,688 disks  300TB (with low capacity disks)

The full costs? • Space • Power, cooling • Software • LAN • Replacement/Expansion 30% per year • Mass storage • People

mass storage ? Do all Tier 1 centres really need a full mass storage operation? • Tapes, robots, storage management software? Need support for export/import media • But think hard before getting into mass storage • Rather • more disks, bigger disks, mirrored disks • cache data across the network from another centre(that is willing to tolerate the stresses of mass storage management) Mass storage is person-power intensive long term costs

Consider outsourcing • Massive growth in co-location centres, ISP warehouses, ASPs, storage renters, etc. • Level 3, Intel, Hot Office, Network Storage Inc, PSI, …. • There will probably be one near you • Check it out – compare costs & prices • Maybe personnel savings can be made

Policies & sociology Access policy? • Collaboration-wide? or restricted access (regional, national, ….) • A rich source of unnecessary complexity Data distribution policies Analysis models • Monarc work will help to plan the centres • But the real analysis models will evolve when the data arrives Keep everything flexible – simple architecture - simple policies - minimal politics

Concluding remarks I • Lots of experience with farms of inexpensive components • We need to scale them up – lots of work but we think we understand it • But we have to learn how to integrate distributed farms into a coherent analysis facility • Leverage other developments • But we need to learn through practice and experience • Retain a healthy scepticism for scalability theories • Check it all out on a realistically sized testbed

Concluding remarks II • Don’t get hung up on optimising component costsDo be very careful with head-count • Personnel costs will probably dominate • Define clear objectives for the centre – • Efficiency, capacity, quality • Think hard if you really need mass storage • Discourage empires & egos • Encourage collaboration & out-sourcing • In fact – maybe we can just buy all this as an Internet service

Building a Regional Centre

Building a Regional Centre

Presentation Transcript

Progress Toward a Regional Warning Centre for Africa

Regional Centre Profile 2009

London Regional Genomics Centre

Hanoi Regional Forecasting Support Centre

Building a Regional Public Health System

BUILDING A REGIONAL HIGHSPEED BROADBAND NETWORK

DOEACC Centre, Chandigarh (Formerly Regional Computer Centre)

Regional Driving Assessment Centre

“A prototype for INFN TIER- 1 Regional Centre”

“A prototype for INFN TIER- 1 Regional Centre”

Building Sustainable Communities Working Group Saskatchewan Regional Centre of Excellence On

CDM: Regional Centre Support

Fleurieu Regional Aquatic Centre

CROATIA Regional Marine Meteorological Centre capacity building – current status

SOUTHERN REGIONAL LOAD DESPATCH CENTRE

SOUTHERN REGIONAL LOAD DESPATCH CENTRE

Regional Fertility Centre

SOUTHERN REGIONAL LOAD DESPATCH CENTRE

SOUTHERN REGIONAL LOAD DESPATCH CENTRE

Building Regional Liaison Networks

Building a Regional Economic Development Blueprint :