190 likes | 361 Views
The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking. Lucas Fernandez Seivane quevedin@mail.desy.de Summer Student 2002 IT Group, DESY Hamburg Supervisor: Andreas Gellrich Oviedo University (Spain). Topics. Some Ideas of QM The QFT Problem
E N D
The High Performance Cluster for QCD Calculations:System Monitoring and Benchmarking Lucas Fernandez Seivane quevedin@mail.desy.de Summer Student 2002 IT Group, DESY Hamburg Supervisor: Andreas Gellrich Oviedo University (Spain)
Topics • Some Ideas of QM • The QFT Problem • Lattice Field Theory • What can we get? • Approaches to the computing • lattice.desy.de: • Hardware • Software • The stuff we made: Clumon • Possible improvements
Let’s do some physics… • QM, “real behavior” of the world: ‘fuzzy world’ • Relativity means causality (cause must precede consequence!) • Any complete description of Nature must combine both ideas • The only consistent way of doing this is … QUANTUM FIELD THEORY
The QFT Problem • Impossible to solve it exactly • PERTURBATIVE APPROACH • Necessity of small coupling constant (like em = 1/137) • Example: QED (the strange theory of light and matter) Taylor: em+2em/2 + 3em/6 +…
… but for QCD • Not small coupling constant (at least at low energies) • We cannot explain (at least analytically) a proton!!! • We do need something exact (the LATTICE is EXACT*)
Lattice field theory • Generic tool for approaching non perturbative QFT • But more necessary in QCD (non perturbative aspects) • Even pure theoretical interests (Wilson approach)
What can we get? • We are interested in the spectra (bound states, masses of particles) • We can do it by means of correlation functions: if we could calculate them exactly, we would have solved the theory • They are extracted out of Path Integrals (foil1) • The problem is calculate Path Integrals Lattice can calculate Path Integrals
A Naïve Approach • Discretize space-time • Monte-Carlo methods for choosing field configurations (Random generators) • Numerical evaluation of Path Integrals and correlation functions!!! (typical lattice sizes: a=0.05-0.1 fm, 1/a = 2GeV, L=32) but…
…but • Huge computer power • Highly dimensional integrals • The calculation requires to compute the inverse of an “infinite”-dimensional matrix, which takes a lot of CPU time and RAM. • That’s why we need clusters, supercomputers or special machines (to divide the work) • The amount of data transferred is not so important, the deciding factor is the LATENCY of the network and the scalability above 1TFlops
How can we get it? • General Purpose Supercomputers: • Very expensive • Rigid (difficult upgrades on hardware) • Fully customed parallel machines: • Completely optimized • Only this use (difficult recycling) • Necessity of design, develop and build (or modify) the hard & soft • Commodity clusters • “Cheap PC” components • Completely customizable • Easy to upgrade / recycle
Machines • Commercial Supercomputers: CrayT3E, Fujitsu VPP77, NECSx4, Hitachi SR8000… • Parallel machines: APEmille/apeNEXT INFN/DESY QCDSP/QCDOC CU/UKQCD/Riken CP-PACS Tsukuba/Hitachi • Commodity clusters + Fast Networking • Low latency (Fast Networking) • Fast Speed • Standard software and programming environments
Lattice cluster@DESY • Cluster bought from a company (Megware), Beowulf type (1 master, 32 slaves) • Before upgrade (some weeks ago): 32 nodes: IntelXEONP4 1.7GHz 256 KB cache 1GB Rambus RAM 2 64 bit PCI slots 18 GB SCSI hard disks Fast Ethernet switch (normal networking, NFS disk mounting) Myrinet network (low latency) • Upgrade (August 2002) 16 nodes: 2 IntelXEONP4 1.7GHz 256 KB cache 16 nodes: 2 IntelXEONP4 2.0GHz 512 KB cache
Lattice cluster@DESY(2) • Software: SuSE Linux (modified by Megware) • MPICH-GM (implementation of MPI-CHamaleon for Myrinet GM system) • Megware Clustware (OpenSCE/SCMS modified): tool for monitoring and administration (but no logs)
Lattice cluster@DESY(3) http://lattice.desy.de/cgi-bin/clumon/cgi_clumon.pl • Andreas Gellrich First Version: • Provides logs and monitoring • Perl written (customizable)
Lattice cluster@DESY(4) http://lattice.desy.de/cgi-bin/clumon/cgi_clumon.pl • Me and Andreas Gellrich new version: • Also graphical data and another log measure • Uses MRTG to graph data
Work done (in progress) • Getting the flavor of a really high-perf cluster • Learning Perl (more or less) to understand Andreas tool • Playing around with Andreas tool • Search for how to graph this kind of data • Learning how to use MRTG/RRDtool • Some test and previous versions • Only have to do last retouches (polishing): • Time info of the cluster • Better documentation of the tools • Play around this last week with other stuff • Prepare talk and document and write up
Possible Improvements • The cluster is unplugged to AFS DESY • Need for Backups / Archiving of the Data stored (dCash theoc01) • Maybe reinstall the cluster with DESY Linux (to fully know what’s in it) • Play around with other cluster stuff: OpenSCE, OSCAR, ROCKS…