1 / 16

PHENIX Computing Center in Japan (CC-J)

PHENIX Computing Center in Japan (CC-J). Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova, Italy. Contents. 1. Overview 2. Concept of the system 3. System Requirement 4. Other requirement as a Regional Computing Center

orinda
Download Presentation

PHENIX Computing Center in Japan (CC-J)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center) Presented on 08/02/2000 at CHEP2000 conference, Padova, Italy

  2. Contents 1. Overview 2. Concept of the system 3. System Requirement 4. Other requirement as a Regional Computing Center 5. Plan and current status 6. WG for constructing the CC-J (CC-J WG) 7. Current configuration of the CC-J 8. Photographs of the CC-J 9. Linux CPU farm 10. Linux NFS performance v.s. kernel 11. HPSS current configuration 12. HPSS performance test 13. WAN performance test 14. Summary Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  3. PHENIX CC-J : Overview • PHENIX Regional Computing Center in Japan (CC-J) at RIKEN • Scope • Principal site of computing for PHENIX simulation • PHENIX CC-J is aiming at coveringmost of the simulation tasks of the whole PHENIX experiments • Regional Asian computing center • Center for the analysis of RHIC spin physics • Architecture • Essentially follow the architecture of RHIC Computing Facility (RCF) at BNL • Construction • R&D for the CC-J started in April ‘98 at RBRC • Construction began in April ‘99 over athree years period • 1/3 scale of of the CC-J will be operational in April 2000 Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  4. Concept of the CC-J System Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  5. Annual Data amount DST 150 TB micro-DST 45 TB Simulated Data 30 TB Total225 TB Hierarchical Storage System Handle data amount of 225TB/year Total I/O bandwidth: 112 MB/s HPSS system Disk storage system 15 TB capacity All RAID system I/O bandwidth: 520 MB/s System Requirement for the CC-J • CPU ( SPECint95) • Simulation 8200 • Sim. Reconst 1300 • Sim. ana. 170 • Theor. Mode 800 • Data Analysis 1000 • Total 11470 • Data Duplication Facility • Export/import DST, simulated data. Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  6. Other Requirements as a Regional Computing Center • Software Environment • Software environment of the CC-J should be compatible to the PHENIX Offline Software environment at the RHIC Computing Facility (RCF) at BNL • AFS accessibility (/afs/rhic) • Objectivity/DB accessibility (replication to be tested soon) • Data Accessibility • Need exchange data of 225 TB/year to RCF • Most part of the data exchange will be done by SD3 tapecartridges (50GB/volume) • Some part of the data exchange will be done over theWAN • CC-J will use Asia-Pacific Advanced Network (APAN) for US-Japan connection • http://www.apan.net/ • APAN has currently 70 Mbps bandwidth for Japan-US connection • Expecting 10-30% of the APAN bandwidth (7-21 M bps) can be used for this project: • 75-230 GB/day ( 27 - 82 TB/year) will be transferred over the WAN Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  7. Plan and current status of the CC-J Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  8. Working Group for the CC-J construction (CC-J WG) • CC-J WG is a main body to construct the CC-J • Hold bi-weekly regular meeting at RIKEN Wako, to discuss technical items and project plans etc. • Mailing list of the CC-J WG created (mail traffic: 1600 mails /year) Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  9. Current configuration of the CC-J

  10. Photographs of the PHENIX CC-J at RIKEN

  11. Linux CPU farms • Memory Requirement : 200-300 MB/CPU for a simulation chain • Node specification • Motherboard: ASUS p2b • Dual CPU /node (currently total 64 CPU) • PentiumII (450MHz) 32 CPU + Pentium III (600 MHz) 32 CPU • 512 MB memory / node (1GB SWAP/node) • 14 GB HD /node (system 4GB, work 10 GB) • 100 BaseT Ethernet interface (DECchip Tulip) • Linux Redhat 5.2 (kernel 2.2.11 + nfsv3 patch) • Portable Batch System (PBSV2.1) for batch queuing • AFS is accessed through the NFS (No AFS client is installed on Linux pc) • Daily mirroring of the /afs/rhic contents to a local disk file system is carrying out • PC Assemble (Alta cluster) • Remote hardware-reset/power control, Remote CPU temp. monitor • Serial port login from the next node (minicom) for maintenance (fsck etc.) Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  12. Linux NFS performance v.s. kernel • NFS Performance test using bonnie benchmark for 2 GB file • NFS Server : SUN Enterprise 450 (Solaris 2.6) 4 CPU (400MHz) 1GB memory • NFS client : Linux RH5.2, Dual Pentium II 600 MB, 512 MB memory • NFS performance of the recent Linux kernel seems to be improved • nfsv3 patch is still useful for the recent kernel (2.2.14) • currently we are using the kernel 2.2.11 + nfsv3 patch • nfsv3 patch is available from http://www.fys.uio.no/~trondmy/src/ Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  13. Current HPSS hardware configuration • IBM RS6000-SP • 5-node (silver node: Quadruple PowerPC604e 332 MHz CPU/node) • Core server : 1, Disk mover : 2, Tape mover : 2 • SP switch(300 MB/s) and 1000BaseSX NIC (OEM of Alteon) • A StorageTek Powderhorn Tape Robot • 4 Redwood drives and 2000 SD3 cartridges (100 TB) dedicated for HPSS • Sharing the robot with other HSM systems • 6 drives and 3000 cartridges for other HSM systems • Gigabit Ethernet • Alteon ACE180 switch for Jumbo Frame ( 9 kB MTU) • Use of the Jumbo Frame reduces the CPU utilization for transfer • CISCO Catalyst 2948G for distribution to 100BaseT • Cache Disk : 700 GB (total), 5 components • 3 SSA loops (50 GB each) • 2 FW-SCSI RAID (270 GB each) Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  14. Performance test of parallel ftp (pftp) of HPSS • pput from SUN-E450 : 12 MB/s for one pftp connection • Gigabit Ethernet, Jumbo Frame (9 kB MTU) • pput from LINUX : 6 MB/s for one pftp connection • 100BaseT - G.Ether - Jumbo (defragment on a switch) • Totally 〜50 MB/s pftp performance was obtained for pput Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  15. WAN performance test • RIKEN (12 Mbps) - IMnet - APAN (70 Mbps) -startap- ESnet - BNL • Round Trip Time for RIKEN-BNL :170 ms • File transfer rate is 47 kB/s for 8 kB TCP widowsize (Solaris default) • Large TCP-window size is necessary to obtain high-transfer rate • RFC1323 (TCP Extensions for high performance, May 1992) describes the method of using large TCP window-size (> 64 KB) • Large ftp performance (641 kB/s = 5 Mbps) was obtained for a single ftp connection using a large TCP window-size (512 kB) over the pacific ocean (RTT = 170 ms) Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

  16. Summary • The construction of the PHENIX Computing Center in Japan (CC-J) at RIKEN Wako campus, which will extend over a three years period, began in April 1999. • The CC-J is intended as the principal site of computing for PHENIX simulation, a regional PHENIX Asian computing center, and a center for the analysis of RHIC spin Physics. • The CC-J will handle the data of about 220 TB/year and the total CPU performance is planned to be 10,000 SPECint95 in 2002. • CPU farm of 64 processors (RH5.2, kernel 2.2.11 with nfsv3 patch) is stable. • About 50 MB/s pftp performance was obtained for HPSS access. • Large ftp performance (641 KB/s = 5 Mbps) was obtained for a single ftp connection using a large TCP window-size (512 kB) over the Pacific Ocean (RTT = 170 ms) • Stress tests for the entire system were carried out successfully. • Replication of the Objectivity/DB over the WAN will be tested soon. • The CC-J operation will be started in April 2000. Takashi Ichihara (RIKEN / RIKEN BNL Research Center)

More Related