190 likes | 290 Views
Building Large Scale Fabrics – A Summary. Marcel Kunze, FZK . Observation. Everybody seems to need unprecedented amount of CPU, Disk and Network b/w Trend to PC based computing fabrics and commodity hardware LCG (CERN), L. Robertson CDF (Fermilab), M. Neubauer D0 (FermiLab), I. Terekhov
E N D
Building Large Scale Fabrics – A Summary Marcel Kunze, FZK
Observation • Everybody seems to need unprecedented amount of CPU, Disk and Network b/w • Trend to PC based computing fabrics and commodity hardware • LCG (CERN), L. Robertson • CDF (Fermilab), M. Neubauer • D0 (FermiLab), I. Terekhov • Belle (KEK), P. Krokovny • Hera-B (DESY), J. Hernandez • Ligo, P. Shawhan • Virgo, D. Busculic • AMS, A.Klimentov • Considerable savings in cost wrt. RISC based farm:Not enough ‘bang for the buck’ (M. Neubauer) Marcel Kunze - FZK
AMS02 Benchmarks 1) Executive time of AMS “standard” job compare to CPU clock 1) V.Choutko, A.Klimentov AMS note 2001-11-01 Marcel Kunze - FZK
Fabrics and Networks: Commodity Equipment Needed for LHC at CERN in 2006: Storage Raw recording rate 0.1 – 1 GB/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing 200’000 of today’s (2001) fastest PCs Networks 5-10 Gbps between main Grid nodes Distributed computing effort to avoid congestion: 1/3 at CERN 2/3 elsewhere Marcel Kunze - FZK
PC Cluster 5 (Belle) 1U server Pentium III 1.2GHz 256 CPU (128 nodes) Marcel Kunze - FZK
3U PC Cluster 6 Blade server: LP Pentium III 700MHz 40CPU (40 nodes) Marcel Kunze - FZK
Disk Storage Marcel Kunze - FZK
IDE Performance Marcel Kunze - FZK
Basic Questions • Compute farms contain several 1000s of computing elements • Storage farms contain 1000s of disk drives • How to build scalable systems ? • How to build reliable systems ? • How to operate and maintain large fabrics ? • How to recover from errors ? • EDG deals with the issue (P. Kunszt) • IBM deals with the issue (N. Zheleznykh) • Project Eliza: Self healing clusters • Several ideas and tools are already on the market Marcel Kunze - FZK
Storage Scalability • Difficult to scale up to systems of 1000s of components and keep single system image:NFS-Automounter, Symbolic links etc. (M.Neubauer, CAF: ROOTD does not need this and allows for direct worldwide access to distributed files w/o mounts) • Scalability in size and throughput by means of storage virtualisation • Allows to set up non-TCP/IP based systems to handle multi-GB/s Marcel Kunze - FZK
Internet Intranet Virtualisation of Storage Data Servers mount virtual storage as SCSI-Device Input Load balancing switch Shared Data Access (Oracle, PROOF) Storage Area Network (FCAL, InfiniBand,…) 200 MB/s sustained Scalability Marcel Kunze - FZK
Storage Elements(M. Gasthuber) • PNFS = Perfectly Normal FileSystem • Store MetaData with the Data • 8 hierarchies of file tags • Migration of data (hierarchical storage systems): dCache • Development of DESY and FermiLab • ACLs, Kerberos, ROOT-aware • Web-Monitoring • Cached as well as direct tape access • Fail-safe Marcel Kunze - FZK
Necessary admin. Tools(A. Manabe) • System (SW) Installation /update • Dolly++ (Image cloning) • Configuration • Arusha (http://ark.sourceforge.net) • LCFGng (http://www.lcfg.org) • Status Monitoring/ System Health Check • CPU/memory/disk/network utilization: Ganglia*1,plantir*2 • (Sub-)system service sanity check: Pikt*3/Pica*4/cfengine*1 http://ganglia.sourceforge.net *2 http://www.netsonde.com*3 http://pikt.org *4 http://pica.sourceforge.net/wtf.html • Command Execution • WANI: WEB base remote command executer Marcel Kunze - FZK
WANI is implemented on `Webmin’ GUI Start Command input Node selection Marcel Kunze - FZK
Command execution result Host name Results from 200nodes in 1 Page Marcel Kunze - FZK
Stdout output Click here Click here Stderr output Marcel Kunze - FZK
CPU Scalability • The current tools scale up to ~1000 CPUs(In the previous example 10000 CPUs would require to check 50 pages) • Autonomous operation required • Intelligent self-healing clusters Marcel Kunze - FZK
Resource Scheduling • Problem: How to access local resources from the Grid ? • Local batch queues vs. Global batch queues • Extension of Dynamite (Amsterdam university) to work with Globus: Dynamite-G (I. Shoshmina) • Open Question: How do we deal with interactive applications on the Grid ? Marcel Kunze - FZK
Conclusions • A lot of tools exist • A lot of work needs yet to be done in the Fabric area in order to get reliable, scalable systems Marcel Kunze - FZK