290 likes | 470 Views
Running the multi-platform , multi-experiment cluster at CCIN2P3. Wojciech A. Wojcik. IN2P3 Computing Center. e-mail: wojcik@in2p3.fr URL: http:// webcc .in2p3.fr. IN2P3 Computer Center. Provides the computing and data services for the French high energy and nuclear physic ist s:
E N D
Running the multi-platform, multi-experiment cluster at CCIN2P3 Wojciech A. Wojcik IN2P3 Computing Center e-mail: wojcik@in2p3.fr URL: http://webcc.in2p3.fr
IN2P3 Computer Center • Provides the computing and data services for the French high energy and nuclear physicists: • IN2P3 – 18 physics labs (in all big towns in France) • CEA/DAPNIA • French groups are involved in 35 experiments at CERN, SLAC, FNAL, BNL, DESY and other sites (also astrophysics). • Specific situation: our CC is not directly connected to experimental facilities, like CERN, FNAL, SLAC, DESY, BNL.
General rules • All groups/experiments share the same interactive and batch (BQS) clusters and other type of services (disk servers, tapes, HPSS and networking). Some exceptions later… • /usr/bin and lib (OS and compilers) are local • /usr/local/* on AFS, specific for each platform • /scratch – local tmp disk space • System, group and user profiles define the proper environment
General rules • User has the AFS account with access to the following AFS disk spaces: • HOME - backup by CC • THRONG_DIR (up to 2GB) - backup by CC • GROUP_DIR (n * 2GB), no – backup • Data are on: disks (GROUP_DIR, Objectivity), tapes (xtage system) or in HPSS • Data exchange on the following media: • DLT, 9480 • Network (bbftp) • ssh/ssf - access to/from external domains recommended.
Supported platforms • Supported platforms: • Linux (RedHat 6.1, kernel 2.2.17-14smp) with different egcs compilers (gcc 2.91.66, gcc 2.91.66 with patch for Objy 5.2, gcc 2.95.2 – installed on /usr/local), requested by different experiments • Solaris 2.6, 2.7 soon • AIX 4.3.2 • HP-UX 10.20 – end of this service already announced
Support for experiments • About 35 different High Energy, Astrophysics and Nuclear Physics experiments. • LHC experiments: CMS, Atlas, Alice and LHCb. • Big non-CERN experiments: BaBar, D0, STAR, PHENIX, AUGER, EROS II.
Disk space • Need to make the disk storage independent of the operating system. • Disk servers based on: • A3500 from Sun with 3.4 TB • VSS from IBM with 2.2 TB • ESS from IBM with 7.2 TB • 9960 from Hitachi with 21.0 TB
Mass storage • Supported medias (all in the STK robots): • 3490 • DLT4000/7000 • 9840 (Eagles) • Limited support for Redwood • HPSS – local developments: • Interface with RFIO: • API: C, Fortran (via cfio from CERNLIB) • API: C++ (iostream) • bbftp – secure parallel ftp using RFIO interface
Mass storage • HPSS – test and production services • $HPSS_TEST_SERVER:/hpsstest/in2p3.fr/… • $HPSS_SERVER:/hpss/in2p3.fr/… • HPSS – usage: • BaBar - usage via ams/oofs and RFIO • EROS II – already 1.6 TB in HPSS • AUGER, D0, ATLAS, LHCb • Other experiments on tests: SNovae, DELPHI, ALICE, PHENIX, CMS
Networking - LAN • Fast Ethernet (100 Mb full duplex) --> to interactive and batch services • Giga Ethernet (1 Gb full duplex) --> to disk servers and Objectivity/DB server
Networking - WAN • Academic public network “Renater 2” based on virtual networking (ATM) with guaranteed bandwidth (VPN on ATM) • Lyon CERN at 34Mb (155 Mb in June 2001) • Lyon US is going through CERN • Lyon Esnet (via STAR TAP), 30-40 Mb, reserved for the traffic to/from ESnet, except FNAL.
BAHIA - interactive front-end Based on multi-processors: • Linux (RedHat 6.1) -> 10 PentiumII450 + 12 PentiumIII1GHz (2 processors) • Solaris 2.6 -> 4 Ultra-4/E450 • Solaris 2.7 -> 2 Ultra-4/E450 • AIX 4.3.2 -> 6 F40 • HP-UX 10.20 -> 7 HP9000/780/J282
Batch system - BQS Batch based on BQS (CCIN2P3 product) • In constant development, used since 7 years • Posix compliant, platform independent (portable) • Possibilities to define the resources for the job (the class of job is calculated by scheduler as a function of): • CPU time, memory • CPU bound or I/O bound • Platform(s) • System resources: local scratch disk, stdin/out size • User resources (switches, counters)
Batch system - BQS • Scheduler takes into account: • Targets for groups (declared twice a year for the big production runs) • Consumption of cpu time in last periods: month, week, day for user and group • Proper aging and interleave in the class queues • Possibility to open the worker for any combination of classes.
Batch system - configuration • Linux (RedHat 6.1) -> 96 dual PIII 750MHz + 110 dual PIII1GHz • Solaris 2.6 -> 25 * Ultra60 • Solaris 2.7 -> 2 * Ultra60 (test service) • AIX 4.3.2 -> 29 * RS390 + 20 * 43P-B50 • HP-UX 10.20 -> 52 * HP9000/780
Regional Center for: • EROS II (Expérience de Recherches d’Objets Sombres par effet de lentilles gravitationnelles) • BaBar • Auger (PAO) • D0
EROS II • Raw data (from ESO site in Chili) on DLTs (tar format). • Restructuring of the data from DLT to 3490 or 9480,creation of metadata on Oracle DB. • Data server (on development) - 7TB of data actually, 20TB at the end of experiment – using HPSS + WEB server.
BaBar • AIX and HP-UX not supported by BaBar, Solaris 2.6 with Workshop 4.2 and Linux (RedHat 6.1). Solaris 2.7 in preparation. • Data are stored in ObjectivityDB, import/export of data is done using bbftp. The import/export on the tapes has been abandoned. • Objectivity (ams/oofs) servers (dedicated only to BaBar) have been installed (10 servers). • Usage of HPSS for staging the ObjectivityDB files.
PAO - AUGER • CCIN2P3 is acting as AECC (AUGER European CC). • Access granted to all AUGER users (AFS accounts provided). • CVS repository for AUGER software has been installed at CCIN2P3, access from AFS (from the local and non-local cells) and from non-AFS environment using ssh. • Linux is the preferred platform. • Simulation software based on Fortran programs.
D0 • Linux is one of D0 supported platforms and is available at CCIN2P3. • D0 software is using the KAI C++ compiler • Import/export of D0 data (using internal Enstore format) is a complicated work. We will try to use the bbftp as a file transfer program.
Import/export CERN CASTOR HPSS SLAC HPSS CCIN2P3 ? ? ? ? HPSS FNAL ENSTORE SAM BNL HPSS
Problems • To add the new Objy servers (for other experiments) is very complicated. It needs the new separate machines, with modified port numbers in /etc/services. Under development for CMS. • The OS system versions and levels • The compilers versions (mainly for Objy for different experiments). • Solutions?
Conclusions • The data exchange should be done using the standards (e.g. files or tapes) and common access interfaces (bbftp and rfioarethe good examples). • Needs for better coordination and similar requirements on supported system and compiler levels between experiments. • The choice of the CASE technologie is out of the control of our CC acting as Regional Computer Center. • GRID will require more uniform configuration of the distributed elements. • Who can help? HEPCCC? HEPiX? GRID?