1 / 11

Status of the Bologna Computing Farm and GRID related activities

Status of the Bologna Computing Farm and GRID related activities. Vincenzo M. Vagnoni Thursday, 7 March 2002 . Outline. Currently available resources Farm configuration Performance Scalability of the system (in view of the DC) Resources Foreseen for the DC Grid middleware issues

kaiyo
Download Presentation

Status of the Bologna Computing Farm and GRID related activities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002

  2. Outline • Currently available resources • Farm configuration • Performance • Scalability of the system (in view of the DC) • Resources Foreseen for the DC • Grid middleware issues • Conclusions

  3. Current resources • Core system (hosted in two racks at INFN-CNAF) • 56 CPUs hosted in Dual Processor machines (18 PIII 866 MHz + 32 PIII 1 GHz + 6 PIII Tualatin 1.13 GHz), 512 MB RAM • 2 Network Attached Storage systems • 1 TB in RAID5, with 14 IDE disks + hot spare • 1 TB in RAID5, with 7 SCSI disks + hot spare • 1 Fast Ethernet switch with Giga Uplink. • Ethernet Controlled power distributor for remote power cicle • Additional resources by INFN-CNAF • 42 CPUs in dual Processor machines (14 PIII 800 MHz, 26 PIII 1 GHz, 2 PIII Tualatin 1.13 GHz)

  4. Farm Configuration (I) • Diskless processing nodes with OS centralized on a file server (Root over NFS) • It makes trivial the introduction or removal of a node in the system, i.e. no need of software installation on local disks • Grants easy interchange or CEs in case of shared resources (e.g. among various experiments), and permits dynamical allocation of the latter without additional work • Very stable! No real drawback observed in about 1 year of run • Improved security • Usage of private network IP addresses and Ethernet VLAN • High level of isolation • Access to external services (afs, mccontrol, bookkeeping db, servlets of various kinds, …) provided by means of NAT technology on the GW • Most important critical systems (Single Points of Failure), but not everything actually, made redundant • Two NAS in the core system with RAID5 redundancy • GW and OS server: operating systems installed on two RAID1 disks (Mirroring)

  5. Farm Configuration (II)

  6. Fast ethernet switch Rack (1U dual-processor MB) NAS 1TB Ethernet controlled power distributor (32 channels)

  7. Performance • System has been fully integrated in the LHCb MC production since August 2001 • 20 CPUs until December, 60 CPUs until last week, 100 CPUs now • Produced mostly bb inclusive DST2 with the classic detector (SICBMC v234 and SICBDST v235r4, 1.5 M) + some 100k channel data sets for LHCb light studies • Typically roughly 20 hours needed on a 1 GHz PIII for the full chain (minbias RAWH + bbincl RAWH + bbincl piled up DST2) for 500 events • Farm capable of producing about (500 events/day)*(100 CPUs)=50000 events/day, i.e. 350000 events/week, i.e. 1.4 TB/week (RAWH + DST2) • Data transfer to CASTOR at CERN realized with standard ftp (15 Mbit/s over available bandwidth of 100 Mbit/s), but tests with bbftp reached very good troughput (70 Mbit/s) • Still waiting for IT to install a bbftp server at CERN

  8. Scalability • Production tests made these days with 82 MC processes running in parallel • Using the two NAS systems independently (instead to share the load between them) • Each NAS worked at 20% of full performance, i.e. each of them can be scaled up much more than a factor 2 • Distributing the load we are pretty sure this system can handle more than 200 CPUs working at the same time at 100% (i.e. without bottlenecks) • For the analysis we want to test other technologies • We plan to test a fibre channel network (SAN, Storage Area Network) on some of our machines, with nominal 1 Gbit/s bandwidth to fibre channel disk arrays

  9. Resources for the DC • Additional resources by INFN-CNAF foreseen for the DC period • We’ll join the DC with order of 150-200 CPUs (around 1 GHz or more), 5 TB of disk storage and a local tape storage system (CASTOR like? Not yet officially decided) • Still need some work to make the system fully redundant

  10. Grid issues (A. Collamati) • 2 nodes reserved at the moment for tests on GRID middleware • The two nodes form a minifarm, i.e. they have exactly the same configuration as the production nodes (one master node and one slave node) and can run MC jobs as well • Globus has been installed and first trivial tests on job submission through PBS were successful • Test job submission via globus on large scale by extending the PBS queue of the globus test farm to all our processing nodes • No interference with the distributed production working system

  11. Conclusions • Bologna is ready to join the DC with a reasonable amount of resources • Scalability tests were successful • The farm configuration is pretty stable • We need the bbftp server installed at CERN to fully exploit WAN connectivity and throughput • We are waiting for the decision of the DC period by CERN for the final allocation of INFN-CNAF resources • Work on GRID middleware started, first results are encouraging • We plan to install Brunel ASAP

More Related