490 likes | 645 Views
Commodity Computing Summary of Session E. John Gordon IT Department Rutherford Appleton Lab CLRC, UK. Parallel Session E. Commodity Hardware and Software & Integration in Farm and Large Systems Co-Chairs Steve Wolbers, FNAL John Gordon, RAL. 5 Posters.
E N D
Commodity ComputingSummary of Session E John Gordon IT Department Rutherford Appleton Lab CLRC, UK
Parallel Session E • Commodity Hardware and Software & Integration in Farm and Large Systems • Co-Chairs • Steve Wolbers, FNAL • John Gordon, RAL
5 Posters • E058 Fast Transfer of Shared Data, JLAB • E165 SUE: A Special Purpose Computer for Spin Glass Models Zaragoza, Prague, CERN • E173 Monitoring and management of Large Distributed Computing Systems at Fermilab • E214 The Use of Open Source Tools in PHENIX Code Development • E380 EuroStore mass storage project in the medical environment TERA
Compute Farms 70 Design and First Tests of the CDF Run 2 Farms Wolbers, S. 60 Report on the D0 Linux Production FarmsSchellman, H. 23 The Compass Computing Farm ProjectLamanna, M. 137 Full Online Event Reconstruction at HERA-BGellrich, A. 150 A Linux PC Farm for Physics Batch Analysis in the ZEUS ExperimentKowal, M. Tools for Batch Farms 186 Visualization Tools for Monitoring and Evaluation of Distributed Computing SystemsCowan, R. 362 Software for Batch FarmsBird, I. 168 A Mobile-Agent-Based Performance-Monitoring System at RHICIbbotson, R. 115 Sun Microsystems' AutoClient and management of computer farms at BaBarBoeheim C 386 MAPBowcock, T. Large Scale and Production Systems 185 RHIC Computing Facility Processing SystemGibbard, B. 311 The D0 Monte Carlo ChallengeGraham, G. 48 PHENIX Computing Center in Japan (CC-J)Ichihara, T. 301 AMUN: A Practical Application Using the NILE Control SystemBaker, R. 176 Fermilab Computing Division Systems for Scientific Data Storage and MovementPetravick, D. Administration and Management 141 Automating Linux Installations at CERNReguero, I. 309 Elephant, meet Penguin: Bringing up Linux in BaBarGowdy, S. 237 A Modular Adminstration Tool for Large Clusters of Linux ComputersYocum, D. 369 Large Scale Parallel Print Service.Deloose, I. 248 Software Sharing at Fermilab - Experiences from Run II, KITS and FermitoolsPordes, R Data GRIDs 163 Harnessing the Capacity of Computational Grids for High Energy PhysicsLivny, M. 277 The Promise of Computational Grids in the LHC EraAvery, P. 345 Using NetLogger for Performance Analysis of the BABAR Data Analysis SystemTierney, B. R&D 327 Lattice QCD with Commodity Hardware and SoftwareHolmgren, D. 348 Status report on Open Source/Open Science 1999Johnson, M. 255 Designing a PC farm to simultaneously process separate computations through different network topologies interconnecting the individual PCsDreher, P. 41 Scalable Parallel Implementation of GEANT4 Using Commodity Hardware and Task Oriented Parallel CCooperman, G. 164 An HS-Link Network Interface Board for Parallel ComputingUngil, C. 191 Farms Batch System and Fermi Inter-Process Communication Mandrichenko, I.. 272 CORBA/RMI Issues in the Java Implementation of the Nile Distributed Operating SystemZhou, L. 30 Talks
Topics • Compute Farms • Tools for Batch Farms • Large Scale and Production Systems • Administration and Management • Data GRIDs • R&D
Compute Farms 70 Design and First Tests of the CDF Run 2 Farms Wolbers, S. 60Report on the D0 Linux Production FarmsSchellman, H. 23 The Compass Computing Farm ProjectLamanna, M. 137 Full Online Event Reconstruction at HERA-BGellrich, A. 150 A Linux PC Farm for Physics Batch Analysis in the ZEUS ExperimentKowal, M. 386 MAPBowcock, T.
Compute Farms • They exist, lots of them • They are growing: Both for existing experiments and prototypes for new ones • People seem confident about meeting their cpu requirements • They are still surprised at how powerful and cheap PCs are
Farm Batch System Monitor 100% of dual 50% of dual CPU Jobs use 100% of CPU
A1 A2 A3 Input Stream (x8) A A4 Farms A5 A6 E070 A7
Large Scale and Production Systems 185 RHIC Computing Facility Processing SystemGibbard, B. 311 The D0 Monte Carlo ChallengeGraham, G. 48 PHENIX Computing Center in Japan (CC-J)Ichihara, T. 301 AMUN: A Practical Application Using the NILE Control SystemBaker, R. 176 Fermilab Computing Division Systems for Scientific Data Storage and MovementPetravick, D.
Large Scale and Production Systems • They are big: RHIC ready for 1PB/year • Commodity cpus, not commodity disk or tape (except FNAL) • Riken-CCJ doing WAN data tests • AMUN - uses NILE to give fault-tolerant use of spare resources
Tools for Batch Farms 186 Visualization Tools for Monitoring and Evaluation of Distributed Computing SystemsCowan, R. 362 Software for Batch FarmsBird, I. 168 A Mobile-Agent-Based Performance- Monitoring System at RHICIbbotson, R. 115 Sun Microsystems' AutoClient and management of computer farms at BaBarBoeheim C 191 Farms Batch System and Fermi Inter-Process Communication Mandrichenko, I..
JLAB Batch • Reported that PBS can replace LSF for straightforward batch farms • UI Layer hides both PBS and LSF
Administration and Management 141 Automating Linux Installations at CERNReguero, I. 309 Elephant, meet Penguin: Bringing up Linux in BaBarGowdy, S. 237 A Modular Adminstration Tool for Large Clusters of Linux Computers Yocum, D. 369 Large Scale Parallel Print Service.Deloose, I. 248 Software Sharing at Fermilab - Experiences from Run II, KITS and FermitoolsPordes, R
Administration and Management • Strong interest in Linux administration • Many solutions for installation, monitoring, tracking • Others looking at printing, batch.
Systracker: The basis • Presume that one can install a system to a base configuration. Take a snapshot of this as the system baseline. This is the fundamental assumption. • Almost never “good enough”. • Irrestistible urge to customize “personal” computers. • Obstinant refusal to use “standard” methods of admin • Frequently sufficient local (legitimate) customization that restoring this manually takes longer than the reinstall of the base system. E237
Systracker: The method • Use tripwire mechanisms to monitor system files and directories for changes and check updates into a RCS repository. • Modified RPM to archive RPMs to a repository. • Create a module to create a “replay” script from differences between baseline and target. • Working on installation scripts to replay the “replay” NB: NONE of this is inherently Linux specific.
Elephant, meet Penguin • Linux now deployed as a BaBar platform • Complete validation ongoing • Users feedback important • Collaboration has significant CPU on Linux • Not yet used by BaBar • Remaining worries • Objectivity builds not available for glibc 2.1 • Large file support • Not critical for our current operation procedures • Rules out Linux servers for Objectivity at SLAC
Data GRIDs 163 Harnessing the Capacity of Computational Grids for High Energy PhysicsLivny, M. 277 The Promise of Computational Grids in the LHC EraAvery, P. 345 Using NetLogger for Performance Analysis of the BABAR Data Analysis SystemTierney, B
Data GRIDs • 163 INFN-wide Condor grid across Italy • 277 Balance of Tier2 and Tier1 and CERN • 345 Netlogger - uses standard for logging messages. Instrument your application with it and use their visualiser.
R&D 327 Lattice QCD with Commodity Hardware and Software Holmgren, D. 348 Status report on Open Source/Open Science 1999Johnson, M. 255 Designing a PC farm to simultaneously process separate computations through different network topologies interconnecting the individual PCsDreher, P. 41 Scalable Parallel Implementation of GEANT4 Using Commodity Hardware and Task Oriented Parallel CCooperman, G. 164 An HS-Link Network Interface Board for Parallel Computing Ungil, C. 272 CORBA/RMI Issues in the Java Implementation of the Nile Distributed Operating SystemZhou, L.
Issues • Commodity • Linux • Data Distribution • Management
Commodity • Heard a lot about commodity hardware • Some about software to manage commodity hardware (PCs, printers) and software (Linux) • Little about commodity software • The software for managing needs to become a commodity, even if only in the HEP world. • Far too much parallel development (eg SRT) • The GRID will help with this.
Linux • A lot of work on, and interest in, Linux (Kernels?) • A number of different solutions to administration - none perfect yet. • Notable that most current and upcoming systems include something else for i/o or database. Compass come closest. • Still an operating system for commodity computers, not a commodity-class operating system. Still too many dependencies on software and kernel levels.
What about NT? • (Almost) no mention of NT • But we know it is being used a lot • Still a hot topic in computer centres • …so why no mention here? • Not of active interest to physicists to produce physics
NT FNAL System Census 1999
FNAL • Collaboration Between Run II Experiments • Common Farm Hardware Development • Common Software Installation Tools • Collaboration with DESY • Caching Filesystem • Enstore
Remote Use • The remote use of data not seriously addressed by current and upcoming experiments. • Requirement acknowledged, but not firmly embedded in plan or priority • JLAB experience - planned but never happened • GRIDs, MONARC look promising but needs to stay a high priority • Will need persistence
The Future? • Operating Systems • Linux? Probably, but not yet • Hardware • Compute power will get cheaper and more powerful • Disk will get cheaper and higher density • Tape will get higher density • But will they be cheap, powerful, and compact enough? • Probably • but the power of commodity computing is that it can be installed cheaply and easily • As long as the design is right.
The Future? • Will Computer Centres or experiments ever share software or better still develop it together? • Some encouraging signs • A challenge for CHEP2001 - let’s see more.