1 / 19

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking. Lucas Fernandez Seivane quevedin@mail.desy.de Summer Student 2002 IT Group, DESY Hamburg Supervisor: Andreas Gellrich Oviedo University (Spain). Topics. Some Ideas of QM The QFT Problem

thai
Download Presentation

The High Performance Cluster for QCD Calculations: System Monitoring and Benchmarking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The High Performance Cluster for QCD Calculations:System Monitoring and Benchmarking Lucas Fernandez Seivane quevedin@mail.desy.de Summer Student 2002 IT Group, DESY Hamburg Supervisor: Andreas Gellrich Oviedo University (Spain)

  2. Topics • Some Ideas of QM • The QFT Problem • Lattice Field Theory • What can we get? • Approaches to the computing • lattice.desy.de: • Hardware • Software • The stuff we made: Clumon • Possible improvements

  3. Let’s do some physics… • QM, “real behavior” of the world: ‘fuzzy world’ • Relativity means causality (cause must precede consequence!) • Any complete description of Nature must combine both ideas • The only consistent way of doing this is … QUANTUM FIELD THEORY

  4. The QFT Problem • Impossible to solve it exactly • PERTURBATIVE APPROACH • Necessity of small coupling constant (like em = 1/137) • Example: QED (the strange theory of light and matter) Taylor: em+2em/2 + 3em/6 +…

  5. … but for QCD • Not small coupling constant (at least at low energies) • We cannot explain (at least analytically) a proton!!! • We do need something exact (the LATTICE is EXACT*)

  6. Lattice field theory • Generic tool for approaching non perturbative QFT • But more necessary in QCD (non perturbative aspects) • Even pure theoretical interests (Wilson approach)

  7. What can we get? • We are interested in the spectra (bound states, masses of particles) • We can do it by means of correlation functions: if we could calculate them exactly, we would have solved the theory • They are extracted out of Path Integrals (foil1) • The problem is calculate Path Integrals Lattice can calculate Path Integrals

  8. A Naïve Approach • Discretize space-time • Monte-Carlo methods for choosing field configurations (Random generators) • Numerical evaluation of Path Integrals and correlation functions!!! (typical lattice sizes: a=0.05-0.1 fm, 1/a = 2GeV, L=32) but…

  9. …but • Huge computer power • Highly dimensional integrals • The calculation requires to compute the inverse of an “infinite”-dimensional matrix, which takes a lot of CPU time and RAM. • That’s why we need clusters, supercomputers or special machines (to divide the work) • The amount of data transferred is not so important, the deciding factor is the LATENCY of the network and the scalability above 1TFlops

  10. How can we get it? • General Purpose Supercomputers: • Very expensive • Rigid (difficult upgrades on hardware) • Fully customed parallel machines: • Completely optimized • Only this use (difficult recycling) • Necessity of design, develop and build (or modify) the hard & soft • Commodity clusters • “Cheap PC” components • Completely customizable • Easy to upgrade / recycle

  11. Machines • Commercial Supercomputers: CrayT3E, Fujitsu VPP77, NECSx4, Hitachi SR8000… • Parallel machines: APEmille/apeNEXT INFN/DESY QCDSP/QCDOC CU/UKQCD/Riken CP-PACS Tsukuba/Hitachi • Commodity clusters + Fast Networking • Low latency (Fast Networking) • Fast Speed • Standard software and programming environments

  12. Lattice cluster@DESY • Cluster bought from a company (Megware), Beowulf type (1 master, 32 slaves) • Before upgrade (some weeks ago): 32 nodes: IntelXEONP4 1.7GHz 256 KB cache 1GB Rambus RAM 2  64 bit PCI slots 18 GB SCSI hard disks Fast Ethernet switch (normal networking, NFS disk mounting) Myrinet network (low latency) • Upgrade (August 2002) 16 nodes: 2 IntelXEONP4 1.7GHz 256 KB cache 16 nodes: 2 IntelXEONP4 2.0GHz 512 KB cache

  13. Lattice cluster@DESY(2) • Software: SuSE Linux (modified by Megware) • MPICH-GM (implementation of MPI-CHamaleon for Myrinet GM system) • Megware Clustware (OpenSCE/SCMS modified): tool for monitoring and administration (but no logs)

  14. Lattice cluster@DESY(3) http://lattice.desy.de/cgi-bin/clumon/cgi_clumon.pl • Andreas Gellrich First Version: • Provides logs and monitoring • Perl written (customizable)

  15. Lattice cluster@DESY(4) http://lattice.desy.de/cgi-bin/clumon/cgi_clumon.pl • Me and Andreas Gellrich new version: • Also graphical data and another log measure • Uses MRTG to graph data

  16. Clumon v2.0 (1)

  17. Clumon v2.0 (2)

  18. Work done (in progress) • Getting the flavor of a really high-perf cluster • Learning Perl (more or less) to understand Andreas tool • Playing around with Andreas tool • Search for how to graph this kind of data • Learning how to use MRTG/RRDtool • Some test and previous versions • Only have to do last retouches (polishing): • Time info of the cluster • Better documentation of the tools • Play around this last week with other stuff • Prepare talk and document and write up

  19. Possible Improvements • The cluster is unplugged to AFS DESY • Need for Backups / Archiving of the Data stored (dCash theoc01) • Maybe reinstall the cluster with DESY Linux (to fully know what’s in it) • Play around with other cluster stuff: OpenSCE, OSCAR, ROCKS…

More Related