110 likes | 230 Views
Status of the Bologna Computing Farm and GRID related activities. Vincenzo M. Vagnoni Thursday, 7 March 2002 . Outline. Currently available resources Farm configuration Performance Scalability of the system (in view of the DC) Resources Foreseen for the DC Grid middleware issues
E N D
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002
Outline • Currently available resources • Farm configuration • Performance • Scalability of the system (in view of the DC) • Resources Foreseen for the DC • Grid middleware issues • Conclusions
Current resources • Core system (hosted in two racks at INFN-CNAF) • 56 CPUs hosted in Dual Processor machines (18 PIII 866 MHz + 32 PIII 1 GHz + 6 PIII Tualatin 1.13 GHz), 512 MB RAM • 2 Network Attached Storage systems • 1 TB in RAID5, with 14 IDE disks + hot spare • 1 TB in RAID5, with 7 SCSI disks + hot spare • 1 Fast Ethernet switch with Giga Uplink. • Ethernet Controlled power distributor for remote power cicle • Additional resources by INFN-CNAF • 42 CPUs in dual Processor machines (14 PIII 800 MHz, 26 PIII 1 GHz, 2 PIII Tualatin 1.13 GHz)
Farm Configuration (I) • Diskless processing nodes with OS centralized on a file server (Root over NFS) • It makes trivial the introduction or removal of a node in the system, i.e. no need of software installation on local disks • Grants easy interchange or CEs in case of shared resources (e.g. among various experiments), and permits dynamical allocation of the latter without additional work • Very stable! No real drawback observed in about 1 year of run • Improved security • Usage of private network IP addresses and Ethernet VLAN • High level of isolation • Access to external services (afs, mccontrol, bookkeeping db, servlets of various kinds, …) provided by means of NAT technology on the GW • Most important critical systems (Single Points of Failure), but not everything actually, made redundant • Two NAS in the core system with RAID5 redundancy • GW and OS server: operating systems installed on two RAID1 disks (Mirroring)
Fast ethernet switch Rack (1U dual-processor MB) NAS 1TB Ethernet controlled power distributor (32 channels)
Performance • System has been fully integrated in the LHCb MC production since August 2001 • 20 CPUs until December, 60 CPUs until last week, 100 CPUs now • Produced mostly bb inclusive DST2 with the classic detector (SICBMC v234 and SICBDST v235r4, 1.5 M) + some 100k channel data sets for LHCb light studies • Typically roughly 20 hours needed on a 1 GHz PIII for the full chain (minbias RAWH + bbincl RAWH + bbincl piled up DST2) for 500 events • Farm capable of producing about (500 events/day)*(100 CPUs)=50000 events/day, i.e. 350000 events/week, i.e. 1.4 TB/week (RAWH + DST2) • Data transfer to CASTOR at CERN realized with standard ftp (15 Mbit/s over available bandwidth of 100 Mbit/s), but tests with bbftp reached very good troughput (70 Mbit/s) • Still waiting for IT to install a bbftp server at CERN
Scalability • Production tests made these days with 82 MC processes running in parallel • Using the two NAS systems independently (instead to share the load between them) • Each NAS worked at 20% of full performance, i.e. each of them can be scaled up much more than a factor 2 • Distributing the load we are pretty sure this system can handle more than 200 CPUs working at the same time at 100% (i.e. without bottlenecks) • For the analysis we want to test other technologies • We plan to test a fibre channel network (SAN, Storage Area Network) on some of our machines, with nominal 1 Gbit/s bandwidth to fibre channel disk arrays
Resources for the DC • Additional resources by INFN-CNAF foreseen for the DC period • We’ll join the DC with order of 150-200 CPUs (around 1 GHz or more), 5 TB of disk storage and a local tape storage system (CASTOR like? Not yet officially decided) • Still need some work to make the system fully redundant
Grid issues (A. Collamati) • 2 nodes reserved at the moment for tests on GRID middleware • The two nodes form a minifarm, i.e. they have exactly the same configuration as the production nodes (one master node and one slave node) and can run MC jobs as well • Globus has been installed and first trivial tests on job submission through PBS were successful • Test job submission via globus on large scale by extending the PBS queue of the globus test farm to all our processing nodes • No interference with the distributed production working system
Conclusions • Bologna is ready to join the DC with a reasonable amount of resources • Scalability tests were successful • The farm configuration is pretty stable • We need the bbftp server installed at CERN to fully exploit WAN connectivity and throughput • We are waiting for the decision of the DC period by CERN for the final allocation of INFN-CNAF resources • Work on GRID middleware started, first results are encouraging • We plan to install Brunel ASAP