1 / 25

Tier1 status at INFN-CNAF

Tier1 status at INFN-CNAF. Giuseppe Lo Re INFN – CNAF Bologna Offline Week 3-9- 2003. INFN – Tier1. INFN computing facility for HEP community Location: INFN-CNAF, Bologna (Italy) One of the main nodes on GARR network Ending prototype phase this year Fully operational next year

ahazel
Download Presentation

Tier1 status at INFN-CNAF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tier1 status at INFN-CNAF Giuseppe Lo Re INFN – CNAF Bologna Offline Week 3-9-2003

  2. INFN – Tier1 • INFN computing facility for HEP community • Location: INFN-CNAF, Bologna (Italy) • One of the main nodes on GARR network • Ending prototype phase this year • Fully operational next year • Personnel: ~ 10 FTE’s • Multi-experiment • LHC experiments, Virgo, CDF, BABAR • Resources dynamically assigned to experiments according to their needs • Main (~50%) Italian resources for LCG • Coordination with other Tier1 (management, security etc..) • Coordination with Italian tier2s, tier3s • Participation to grid test-beds (EDG,EDT,GLUE) • GOC (deployment in progress)

  3. Networking • CNAF interconnected to GARR-B backbone at 1 Gbps. • Giga-PoP co-located • GARR-B backbone at 2.5 Gbps. • Manager site for the ALICE Network Stress Test

  4. New Location • CNAF upgrade for Tier1 activity -> New Computing room • The present location (at CNAF office level) is not suitable, mainly due to: • Insufficient space. • Weight (~ 700 kg./0.5 m2 for a standard rack with 40 1U servers). • Moving to the final location, within this month. • New hall in the basement (-2nd floor) almost ready. • Easily accessible with lorries from the road • ~ 1000 m2 of total space • Not suitable for office use (remote control)

  5. Computing units (1) • 160 1U rack-mountable Intel dual processor servers • 800 MHz - 2.2 GHz • 160 1U bi-processors Pentium IV 2.4 GHz to be shipped this month • 1 switch per rack • 40 FastEthernet ports • 2 Gigabit uplinks • Interconnected to core switch via 2 couples of optical fibers • 1 network power control per rack • 380 V three-phase power as input • Outputs 3 independent 220 V lines • Remotely manageable via web

  6. Computing units (2) • OS: Linux RedHat (6.2, 7.2, 7.3, 7.3.2) • Experiment specific library software • Goal: have generic computing units • Experiment specific library software in standard position (e.g. /opt/alice) • Centralized installation system • LCFG (EDG WP4) • Integration with central Tier1 db (see below) • Each farm on a distinct VLAN • Moving a server from a farm to another changes IP address (not name) • Queue manager: PBS • Not possible to have version “Pro” (it is free only for edu) • Free version not flexible enough • Tests of integration with MAUI in progress

  7. Tier1 Database • Resource database and management interface • Hw servers characteristics • Sw servers configuration • Servers allocation • Postgres database as back end • Web interface (apache+mod_ssl+php) • Possible direct access to db for some applications • Monitoring system • nagios • Interface to configure switches and prepare LCFG profiles (preliminary tests done)

  8. Monitoring/Alarms • Monitoring system developed at CNAF • Socket server on each computer • Centralized collector • 100 variables collected every 5 minutes • Data archived on flat file • In progress: XML structure for data archives • User interface: http://tier1.cnaf.infn.it/monitor/ • Next release: JAVA interface • Critical parameters periodically checked by nagios • Connectivity (i.e. ping), system load, bandwidth use, ssh daemon, pbs etc… • User interface: http://tier1.cnaf.infn.it/nagios/ • In progress: configuration interface

  9. Storage • Access to on-line data: DAS, NAS, SAN • 32 TB (> 70 TB this month) • Data served via NFS v3 • Test of several hw technologies (EIDE, SCSI, FC) • Study of large file system solutions and load balancing/failover architectures • PVFS • Easy to install and configure but needs tests for scalability and reliability • GPFS • Not so easy to install and configure, it needs test for performances • “SAN on WAN” tests (collaboration with CASPUR)

  10. Mass Storage Resources • StorageTek library with 9840 and LTO drives • 180 tapes (100 GB each) • StorageTek L5500 with 2000-5000 slots in order • 6 I/O drives • 500 tapes ordered (200 GB each) • CASTOR as front-end software for archiving • Direct access for end-users • Oracle as back-end

  11. CASTOR • Features • Needs a staging area on disk (~ 20% of tape) • ORACLE database as back-end for full capability (a MySQL interface is also included) • ORACLE database is under day-policy backup • Every client needs to install the CASTOR packet (works on almost major OS’s including Windows) • Access via rfio command • CNAF setup • Experiment access from TIER1 farms via rfio, UID/GID protection from single server • National Archive support via rfio with UID/GID protection from single server (moving to bbFTP for security reasons) • Grid-EDG SE tested • AliEn SE tested and working well

  12. CASTOR at CNAF STK L180 2 drive 9840 LEGATO NSR (Backup) SCSI LAN Robot access via SCSI ACSLS SCSI CASTOR 4 drives LTO Ultrium 2 TB Staging Disk

  13. Present ALICE resources • CPU: 6 worker nodes bi-processors 2.4 GHz + 1 AliEn server bi-processor 800 MHz (verificare). Some of ALICE CPU’s have been assigned to CMS for its DC • Disk: 4.2 TB, but only 800 GB used by AliEn • Tape: 2.4 TB, used 1 TB. • Computing and disk resources for 2004 at Italian Tier1 and Tier2’s (Catania and Torino) already submitted to the INFN referees. Feedback expected in a couple of weeks.

  14. Summary & conclusions • INFN-TIER1 is closing the prototype phase • But still testing new technological solutions • Moving the resources to the final location • Starting integration with LCG • We are waiting input for the preparation of the ADC04

  15. Networking • CNAF interconnected to GARR-B backbone at 1 Gbps. • Giga-PoP co-located • GARR-B backbone at 2.5 Gbps. • LAN: star topology • Computing elements connected via FE to rack switch • 3 Extreme Summit 48 FE + 2 GE ports • 3 3550 Cisco 48 FE + 2 GE ports • Enterasys 48 FE 2GE ports • Servers connected to GE switch • 1 3Com L2 24 GE ports • Uplink via GE to core switch • Extreme 7i with 32 GE ports • ER16 Gigabit switch router Enterasys • Disk servers connected via GE to core switch.

  16. GARR FarmSW1 (*) SSR2000 Catalyst6500 FarmSW2(*) FarmSW3(*) NAS3 NAS2 131.154.99.193 131.154.99.192 2T SCSI 8T F.C. LAN CNAF 1 Gbps LAN TIER1 Switch-lanCNAF (*) LHCBSW1 (*) FarmSWG1 (*) Fcds2 Fcds3 Fcds1 (*) vlan tagging enabled 1 Gbps link

  17. Vlan Tagging • Define VLAN’s across switches • Independent from switch brand (Standard 802.1q) • Adopted solution for complete granularity • To each switch port is associated one VLAN identifier • Each rack switch uplink propagates VLAN information • VLAN identifiers are propagated across switches • Each farm has its own VLAN • Avoid recabling (or physical moving) of hw to change the topology • Level 2 isolation of farms • Aid for enforcement of security measures • Possible to define multi-tag ports (for servers)

  18. Remote control • KVM switches permit remote control of servers console • 2 models under test • Paragon UTM8 (Raritan) • 8 Analog (UTP/Fiber) output connections • Supports up to 32 daisy chains of 40 servers (need UKVMSPD modules) • Costs: 6 KEuro + 125 Euro/server (UKVMSPD module) • IP-reach (expansion to support IP transport): 8 KEuro • Autoview 2000R (Avocent) • 1 Analog + 2 Digital (IP transport) output connections • Supports connections up to 16 servers • 3 switches needed for a standard rack • Costs: 4.5 KEuro • NPC’s (Network Power Control) permit remote and scheduled power cycling via snmp calls or web • Bid under evaluation

  19. Raritan

  20. Avocent

  21. TAPE HARDWARE

  22. Electric Power • 220 V mono-phase needed for computers. • 4 – 8 KW per standard rack (with 40 bi-processors)  16-32 A. • 380 V three-phase for other devices (tape libraries, air conditioning etc..). • To avoid black-outs, Tier1 has standard protection systems. • Installed in the new location: • UPS (Uninterruptible Power Supply). • Located in a separate room (conditioned and ventilated). • 800 KVA (~ 640 KW). • Electric Generator. • 1250 KVA (~ 1000 KW).  up to 80-160 racks.

  23. STORAGE CONFIGURATION CLIENT SIDE (Gateway or all Farm must access Storage) CASTOR Server+staging WAN or TIER1 LAN RAIDTEC 1800 Gbyte 2 SCSI interfaces STK180 with 100 LTO (10Tbyte Native) IDE NAS4 Nas4.cnaf.infn.it 1800Gbyte CDF LHCB Fileserver CMS (or more in cluster or HA) diskserv-cms-1.cnaf.infn.it Fileserver Fcds3.cnaf.infn.it FAIL-OVER support FC Switch In order PROCOM NAS2 Nas2.cnaf.infn.it 8100 Gbyte VIRGO ATLAS PROCOM NAS3 Nas3.cnaf.infn.it 4700 Gbyte ALICE ATLAS DELL POWERVAULT 7100 Gbyte 2 FC interface AXUS BROWIE Circa 2200 Gbyte 2 FC interface

More Related