Fabric Management for CERN Experiments Past, Present, and Future

Fabric Managementfor CERN ExperimentsPast, Present, and Future Tim Smith CERN/IT

Contents • The Fabric of CERN today • The new challenges of LHC computing • What has this got to do with the GRID • Fabric Management solutions of tomorrow? • The DataGRID Project Tim Smith: HEPiX @ JLab

Functionalities Batch and Interactive Disk servers Tape Servers + devices Stage servers Home directory servers Application servers Backup service Infrastructure Job Scheduler Authentication Authorisation Monitoring Alarms Console managers Networks Fabric Elements Tim Smith: HEPiX @ JLab

Fabric Technology at CERN PC Farms 10000 Multiplicity Scale 1000 PC Farms RISC Workstations 100 Scalable Systems SP2 CS2 SMPs SGI,DEC,HP,SUN 10 RISC Workstations Mainframes IBM Cray 1 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 Year Tim Smith: HEPiX @ JLab

Architecture Considerations • Physics applications have ideal data parallelism • mass ofindependent problems • No message passing • throughput rather than performance • resilience rather than ultimatereliability • Can build hierarchies of mass market components • High Throughput Computing Tim Smith: HEPiX @ JLab

Component Architecture High capacitybackboneswitch Application Server 100/1000baseT switch CPU CPU CPU CPU CPU Disk Server 1000baseT switch Tape Server Tape Server Tape Server Tape Server Tim Smith: HEPiX @ JLab

Analysis Chain: Farms event filter (selection & reconstruction) detector processed data event summary data raw data batch physics analysis event reconstruction analysis objects (extracted by physics topic) event simulation interactive physics analysis Tim Smith: HEPiX @ JLab

Multiplication ! tomog 1200 tapes pcsf 1000 nomad na49 800 na48 na45 #CPUs mta 600 lxbatch lxplus 400 lhcb l3c 200 ion eff cms 0 ccf Jul-97 Jan-98 Jul-98 Jan-99 Jul-99 Jan-00 atlas alice Tim Smith: HEPiX @ JLab

PC Farms Tim Smith: HEPiX @ JLab

Shared Facilities Tim Smith: HEPiX @ JLab

LHC Computing Challenge • The scale will be different • CPU 10k SI95 1M SI95 • Disk 30TB 3PB • Tape 600TB 9PB • The model will be different • There are compelling reasons why some of the farms and some of the capacity will not be located at CERN Tim Smith: HEPiX @ JLab

Estimated disk storage capacity at CERN Bad News: Tapes < factor 2 reduction in 8 years Significant fraction of cost Non-LHC LHC Moore’s Law Estimated CPU capacity at CERN Bad News: IO 1996: 4G @10MB/s 1TB – 2500MB/s 2000: 50G @ 20 MB/s 1TB – 400 MB/s Non-LHC ~10K SI951200 processors LHC Tim Smith: HEPiX @ JLab

Regional Centres:a Multi-Tier Model CERN – Tier 0 2.5 Gbps IN2P3 622 Mbps RAL FNAL Tier 1 155 mbps 155 mbps Uni n 622 Mbps Lab a Tier2 Uni b Lab c   Department  Desktop MONARC http://cern.ch/MONARC Tim Smith: HEPiX @ JLab

CERN – Tier 0 IN2P3 2.5 Gbps 622 Mbps RAL FNAL Tier 1 155 mbps Uni n 155 mbps Lab a Tier2 622 Mbps Uni b Lab c   Department  Desktop More realistically:a Grid Topology DataGRID http://cern.ch/grid Tim Smith: HEPiX @ JLab

Can we build LHC farms? • Positive predictions • CPU and disk price/performance trends suggest that the raw processing and disk storage capacities will be affordable, and • raw data rates and volumes look manageable • perhaps not today for ALICE • Space, power and cooling issues? • So probably yes… but can we manage them? • Understand costs - 1 PC is cheap, but managing 10000 is not! • Building and managing coherent systems from such large numbers of boxes will be a challenge. 1999: CDR @ 45MB/s for NA48! 2000: CDR @ 90MB/s for Alice! Tim Smith: HEPiX @ JLab

Management Tasks I • Supporting adaptability • Configuration Management • Machine / Service hierarchy • Automated registration / insertion / removal • Dynamic reassignment • Automatic Software Installation and Management (OS and applications) • Version management • Application dependencies • Controlled (re)deployment Tim Smith: HEPiX @ JLab

Management Tasks II • Controlling Quality of Service • System Monitoring • Orientation to the service NOT the machine • Uniform access to diverse fabric elements • Integrated with configuration (change) management • Problem Management • Identification of root causes (faults + performance) • Correlate network / system / application data • Highly automated • Adaptive - Integrated with configuration management Tim Smith: HEPiX @ JLab

Relevance to the GRID ? • Scalable solutions needed in absence of GRID ! • For the GRID to work it must be presented withinformationandopportunities • Coordinated and efficiently run centres • Presentable as a guaranteed quality resource • ‘GRID’ification : the interfaces Tim Smith: HEPiX @ JLab

Mgmt Tasks: A GRID centre • GRID enable • Support external requests: services • Publication • Coordinated + ‘map’able • Security: Authentication / Authorisation • Policies: Allocation / Priorities / Estimation / Cost • Scheduling • Reservation • Change Management • Guarantees • Resource availability / QoS Tim Smith: HEPiX @ JLab

Existing Solutions ? • The world outside is moving fast !! • Dissimilar problems • Virtual super computers (~200 nodes) • MPI, latency, interconnect topology and bandwith • Roadrunner, LosLobos, Cplant, Beowulf • Similar problems • ISPs / ASPs (~200 nodes) • Clustering: high availability / mission critical • The DataGRID : Fabric Management WP4 Tim Smith: HEPiX @ JLab

WP4 Partners • CERN (CH) Tim Smith • ZIB (D) Alexander Reinefeld • KIP (D) Volker Lindenstruth • NIKHEF (NL) Kors Bos • INFN (I) Michele Michelotto • RAL (UK) Andrew Sansum • IN2P3 (Fr) Denis Linglin Tim Smith: HEPiX @ JLab

Concluding Remarks • Years of experience in exploitinginexpensive mass market components • But we need to marry these withinexpensive highly scalablemanagement tools • Build components back together as a resource for the GRID Tim Smith: HEPiX @ JLab

Fabric Management for CERN Experiments Past, Present, and Future

Fabric Management for CERN Experiments Past, Present, and Future

Presentation Transcript

Past, Present and Future

Survey Experiments: Past, Present, Future

TWiki at CERN Past , Present and Future

Past, present and Future

Experiments on Lepton Pairs Past, Present, Future

Past, Present, and Future

Past Present and Future

Past Present and Future

Past, Present, and Future

Past, Present and Future

Identity Management: Past, Present, and Future

Past, Present and Future

- Past, Present and Future

- Past, Present and Future

Past, present and future

Past, Present and Future

Past, Present and Future

- Past, Present and Future

- Past, Present and Future

Past, Present and Future

- Past, Present and Future

Internet Management - Past, Present, Future