240 likes | 360 Views
Building a Regional Centre. A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT. Summary. LHC regional computing centre topology Some capacity and performance parameters From components to computing fabrics Remarks about regional centres
E N D
Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 February 2000 Les Robertson CERN/IT
Summary • LHC regional computing centre topology • Some capacity and performance parameters • From components to computing fabrics • Remarks about regional centres • Policies & sociology • Conclusions
Why Regional Centres? • Bring computing facilities closer to home • final analysis on a compact cluster in the physics department • Exploit established computing expertise & infrastructure • Reduce dependence on links to CERN • full ESD available nearby - through a fat, fast, reliable network link • Tap funding sources not otherwise available to HEP • Devolve control over resource allocation • national interests? • regional interests? • at the expense of physics interests?
2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps 622 Mbps Uni n Lab a Tier2 Uni b Lab c Department Desktop The MONARC RC Topology CERN – Tier 0 University physics department • Final analysis • Dedicated to local users • Limited data capacity – cached only via the network • Zero administration costs (fully automated) Tier 0 – CERN • Data recording, reconstruction, 20% analysis • Full data sets on permanent mass storage – raw, ESD, simulated data • Hefty WAN capability • Range of export-import media • 24 X 7 availability Tier 1 – established data centreor new facility hosted by a lab • Major subset of data – all/most of the ESD, selected raw data • Mass storage, managed data operation • ESD analysis, AOD generation, major analysis capacity • Fat pipe to CERN • High availability • User consultancy – Library & Collaboration Software support Tier 2 – smaller labs, smaller countries, probably hosted by existing data centre • Mainly AOD analysis • Data cached from Tier 1, Tier 0 centres • No mass storage management • Minimal staffing costs MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html
2.5 Gbps 622 Mbps 155 mbps 155 mbps 622 Mbps Desktop The MONARC RC Topology CERN – Tier 0 IN2P3 RAL FNAL Tier 1 Uni n Lab a Tier2 Uni b Lab c Department MONARC report: http://home.cern.ch/~barone/monarc/RCArchitecture.html
2.5 Gbps 622 Mbps 155 mbps 155 mbps 622 Mbps Desktop More realistically - a Grid Topology CERN – Tier 0 IN2P3 DHL RAL FNAL Tier 1 Uni n Lab a Tier2 Uni b Lab c Department
Capacity / Performance all CERN today ~15K SI95 ~25 TB ~100 MB/sec 20% CERN ** 1 SPECint95 = 10 CERNunits = 40 MIPS
Capacity / Performance Approx. Number of farm PCs at CERN today May not find disks as small as that! But we need a high disk count for access, performance, RAID/mirroring, etc. We probably have to buy more disks, larger disks, & use the disks that come with the PCsmuch more disk space Effective throughput of LAN backbone 1.5% of LAN
Building a Regional Centre Commodity components are just fine for HEP • Masses of experience with inexpensive farms • LAN technology is going the right way • Inexpensive high performance PC attachments • Compatible with hefty backbone switches • Good ideas for improving automated operation and management
Evolution of today’s analysis farms Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection
Standard components Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection
HEP’s not special, just more cost conscious Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components • LAN backbone • WAN connection
Limit the role of high end equipment Computing & Storage Fabric built up from commodity components • Simple PCs • Inexpensive network-attached disk • Standard network interface (whatever Ethernet happens to be in 2006) with a minimum of high(er)-end components LAN backbone WAN connection
Components building blocks 36 dual 200 SI95 cpus = 14K SI95s ~ $100K 224 3.5” disks 25-100 TB $50K - $200K 2000 – standard office equipment 36 dual cpus ~900 SI95 120 72GB disks ~9 TB 2005 – standard, cost-optimised, Internet warehouse equipment For capacity & cost estimates see the 1999 Pasta Report: http://nicewww.cern.ch/~les/pasta/welcome.html
The Physics Department System • Two 19” racks & $200K • CPU – 14K SI95 (10% of a Tier1 centre) • Disk – 50TB (50% of a Tier1 centre) • Rather comfortable analysis machine • Small Regional Centres are not going to be competitive • Need to rethink the storage capacity at the Tier1 centres
Tier 1, Tier 2 RCs, CERN A few general remarks: • A major motivation for the RCs is that we are hard pressed to finance the scale of computing needed for LHC • We need to start now to work together towards minimising costs • Standardisation among experiments, regional centres, CERN so that we can use the same tools and practices to … • Automate everything • Operation & monitoring • Disk & data management • Work scheduling • Data export/import (prefer the network to mail) in order to … • Minimise operation, staffing – • Trade off mass storage for disk + network bandwidth • Acquire contingency capacity rather than fighting bottlenecks • Outsource what you can (at a sensible price) • ……. Keep it simple Work together
The middleware The issues are: • integration of this amorphous collection of Regional Centres • Data • Workload • Network performance • application monitoring • quality of data analysis service Leverage the “Grid” developments • Extending Meta-computing to Mass-computing • Emphasis on data management & caching • … and production reliability & quality – Keep it simple Work together
cpu/disk net tape/DVD 200 m2 A 2-experiment Tier 1 Centre Requirement: 240K SI95 220 TB Basic equipment ~ $3m cpus/disks Processors 20 “standard” racks = 1,440 cpus 280K SI95 Disks 12 “standard” racks = 2,688 disks 300TB (with low capacity disks)
The full costs? • Space • Power, cooling • Software • LAN • Replacement/Expansion 30% per year • Mass storage • People
mass storage ? Do all Tier 1 centres really need a full mass storage operation? • Tapes, robots, storage management software? Need support for export/import media • But think hard before getting into mass storage • Rather • more disks, bigger disks, mirrored disks • cache data across the network from another centre(that is willing to tolerate the stresses of mass storage management) Mass storage is person-power intensive long term costs
Consider outsourcing • Massive growth in co-location centres, ISP warehouses, ASPs, storage renters, etc. • Level 3, Intel, Hot Office, Network Storage Inc, PSI, …. • There will probably be one near you • Check it out – compare costs & prices • Maybe personnel savings can be made
Policies & sociology Access policy? • Collaboration-wide? or restricted access (regional, national, ….) • A rich source of unnecessary complexity Data distribution policies Analysis models • Monarc work will help to plan the centres • But the real analysis models will evolve when the data arrives Keep everything flexible – simple architecture - simple policies - minimal politics
Concluding remarks I • Lots of experience with farms of inexpensive components • We need to scale them up – lots of work but we think we understand it • But we have to learn how to integrate distributed farms into a coherent analysis facility • Leverage other developments • But we need to learn through practice and experience • Retain a healthy scepticism for scalability theories • Check it all out on a realistically sized testbed
Concluding remarks II • Don’t get hung up on optimising component costsDo be very careful with head-count • Personnel costs will probably dominate • Define clear objectives for the centre – • Efficiency, capacity, quality • Think hard if you really need mass storage • Discourage empires & egos • Encourage collaboration & out-sourcing • In fact – maybe we can just buy all this as an Internet service