340 likes | 443 Views
The Gelato Federation. What is it exactly ?. Sverre Jarp March, 2003. Gelato is a collaboration. Goal: Promote Linux on Itanium-based systems Sponsor Hewlett-Packard Others coming Members 13 (right now) Mainly from the High Performance/High Throughput Community
E N D
The Gelato Federation What is it exactly ? Sverre Jarp March, 2003
Gelato is a collaboration • Goal: • Promote Linux on Itanium-based systems • Sponsor • Hewlett-Packard • Others coming • Members • 13 (right now) • Mainly from the High Performance/High Throughput Community • Expected to grow rapidly SJ – Mar 2003
Current members • North America • NCAR (National Center for Atmospheric Research) • NCSA (National Center for Supercomputing Applications) • PNNL (Pacific Northwest National Lab) • PSC (Pittsburgh Supercomputer Center) • University of Illinois-Urbana/Champaign • University of Waterloo • Europe • CERN • DIKU (Datalogic Institute, University of Copenhagen) • ESIEE (École Supérieure d’ingénieurs near Paris) • INRIA (Institut National de Recherche en Informatique et Automatique) • Far-East/Australia • Bio-informatics Institute (Singapore) • University of Tsinghua (Beijing) • University of New South Wales (Sydney) SJ – Mar 2003
Center of gravity • Web portal (http://www.gelato.org) • Rich content • (Pointers to) Open source IA-64 applications • Examples: • ROOT (from CERN) • OSCAR (Cluster mgmt software from NSCA) • OpenImpact compiler (UIUC) • News • Information, advice, hints • Related to IPF, Linux kernel, etc. • Member overview • Who is who, etc. SJ – Mar 2003
Current development focus • Six “performance” areas: • Single system scalability • From 2-way to 16-way (HP, Fort Collins) • Cluster Scalability and Performance Mgmt • Up to 128-nodes: NSCA • Parallel File System • BII • Compilers • UIUC • Performance tools, management • HP Labs SJ – Mar 2003
CERN Requirement # 1 Madison @ 1.5 GHz Gelato focus • Better C++ performance through • Better compilers • Faster systems • Both! SJ – Mar 2003
Further Gelato Research and Development • Linux memory management • Superpages • TLB sharing between processes • IA-64 pre-emption support • Compilers/Debuggers • OpenImpact C compiler (UIUC) • Open Research Compiler enhancements (Tsinghua) • Fortran, C, C++ • Parallel debugger (Tsinghua) SJ – Mar 2003
Sverre Jarp IT Division CERN The “opencluster” and the “openlab” SJ – Mar 2003
The “CERN openlab for DataGrid applications” is a framework for evaluating and integrating cutting-edge technologies or services in partnership with industry, focusing on potential solutions for the LCG. The openlab invites members of the industry to join and contribute systems, resources or services, and carry out with CERN large-scale high-performance evaluation of their solutions in an advanced integrated environment. “opencluster” projectThe openlab is constructing a pilot ‘compute and storage farm’ called the opencluster, based on HP's dual processor servers, Intel's Itanium Processor Family (IPF) processors, Enterasys's 10-Gbps switches and, at a later stage, a high-capacity storage system. Definitions SJ – Mar 2003
Technology onslaught • Large amounts of new technology will become available between now and LHC start-up. A few HW examples: • Processors • SMT (Symmetric Multi-Threading) • CMP (Chip Multiprocessor) • Ubiquitous 64-bit computing (even in laptops) • Memory • DDR II-400 (fast) • Servers with 1 TB (large) • Interconnect • PCI-X PCI-X2 PCI-Express (serial) • Infiniband • Computer architecture • Chipsets on steroids • Modular computers • ISC2003 Keynote Presentation Building Efficient HPC Systems from Catalog ComponentsJustin Rattner, Intel Corp., Santa Clara, USA • Disks • Serial-ATA • Ethernet • 10 GbE (NICs and switches) • 1 Terabit backplanes Not all, but some of this will definitely be used by LHC SJ – Mar 2003
Vision: A fully functional GRID cluster node Storage system Remote Fabric WAN Gigabit long-haul link CPU Servers Multi-gigabit LAN SJ – Mar 2003
opencluster strategy • Demonstrate promising technologies • LCG and LHC on-line • Deploy the technologies well beyond the opencluster itself • 10 GbE interconnect in the LHC Testbed • Act as a 64-bit Porting Centre • CMS and Alice already active; ATLAS is interested • CASTOR 64-bit reference platform • Storage subsystem as CERN-wide pilot • Focal point for vendor collaborations • For instance, in the “10 GbE Challenge” everybody must collaborate in order to be successful • Channel for providing information to vendors • Thematic workshops SJ – Mar 2003
The opencluster today • Three industrial partners: • Enterasys, HP, and Intel • A fourth partner joining soon • Data storage subsystem • Which would “fulfill the vision” • Technology aimed at the LHC era • Network switches at 10 Gigabits • Rack-mounted HP servers • 64-bit Itanium processors • Cluster evolution: • 2002: Cluster of 32 systems (64 processors) • 2003: 64 systems (“Madison” processors) • 2004/05: Possibly 128 systems (“Montecito” processors) SJ – Mar 2003
Activity overview • Over the last few months • Cluster installation, middleware • Application porting, compiler installations, benchmarking • Initialization of “Challenges” • Planned first thematic workshop • Future • Porting of grid middleware • Grid integration and benchmarking • Storage partnership • Cluster upgrades/expansion • New generation network switches SJ – Mar 2003
opencluster in detail • Integration of the cluster: • Fully automated network installations • 32 nodes + development nodes • RedHat Advanced Workstation 2.1 • OpenAFS, LSF • GNU, Intel, ORC Compilers (64-bit) • ORC (Open Research Compiler, used to belong to SGI) • CERN middleware: Castor data mgmt • CERN Applications • Porting, Benchmarking, Performance improvements • CLHEP, GEANT4, ROOT, Sixtrack, CERNLIB, etc. • Database software (MySQL, Oracle?) Many thanks to my colleagues in ADC, FIO and CS SJ – Mar 2003
The compute nodes • HP rx2600 • Rack-mounted (2U) systems • Two Itanium-2 processors • 900 or 1000 MHz • Field upgradeable to next generation • 2 or 4 GB memory (max 12 GB) • 3 hot pluggable SCSI discs (36 or 73 GB) • On-board 100 Mbit and 1 Gbit Ethernet • 4 PCI-X slots: • full-size 133 MHz/64-bit slot(s) • Built-in management processor • Accessible via serial port or Ethernet interface SJ – Mar 2003
rx2600 block diagram 12 DIMMs Intel Itanium 2 Intel Itanium 2 zx1 IOA PCIX 133/64 cell 1 1 GB/s 6.4GB/s zx1 IOA PCIX 133/64 zx1 Memory & I/O Controller zx1 IOA 4.3 GB/s PCIX 133/64 cell 0 zx1 IOA Management Processor card PCIX 133/64 zx1 IOA zx1 IOA VGA zx1 IOA monitor 3 internal drives HDD USB 2.0 HDD channel a SCSI Ultra 160 Service processor HDD LAN 10/100 channel b 3 serial ports CD/DVD Gbit LAN IDE CD - DVD LAN 10/100 SJ – Mar 2003
Benchmarks • Comment: • Note that 64-bit benchmarks will pay a performance penalty for LP64, i.e. 64-bit pointers. • Need to wait for AMD systems that can run natively either a 32-bit OS or a 64-bit OS to understand the exact cost for our benchmarks. SJ – Mar 2003
What we would have liked to see for all CERN benchmarks: Projections: Madison @ 1.5 GHz: ~ 81 s Benchmark-1: Sixtrack (SPEC) Small is best! SJ – Mar 2003
Projections: Madison @ 1.5 GHz: ~ 585 CU P4 Xeon @ 3.0 GHz: ~ 620 CU Benchmark-2: CRN jobs/FTN Big is best! SJ – Mar 2003
Projections: Madison @ 1.5 GHz: ~ 660 RM Pentium 4 @ 3.0 GHz/512KB: ~ 750 RM Benchmark-3: Rootmarks/C++ René’s own 2.4 GHz P4 is normalized to 600 RM. Stop press: We have just agreed on a compiler improvement project with Intel SJ – Mar 2003
opencluster - phase 1 • Perform cluster benchmarks: • Parallel ROOT queries (via PROOF) • Observed excellent scaling: • 2 4 8 16 32 64 CPUs • To be reported at CHEP2003 • “1 GB/s to tape” challenge • Network interconnect via 10 GbE switches • Opencluster may act as CPU servers • 50 StorageTek tape drives in parallel • “10 Gbit/s network Challenge” • Groups together all Openlab partners • Enterasys switch • HP servers • Intel processors and n/w cards • CERN Linux and n/w expertise SJ – Mar 2003
10 GbE Challenge SJ – Mar 2003
Network topology in 2002 Disk Servers 1-12 13-24 25-36 37-48 1-12 13-24 25-36 4 4 4 4 2 2 2 E1 OAS E1 OAS 2 2 E1 OAS E1 OAS 4 4 4 4 1-96 FastEthernet 49- 60 61-72 73-84 85-96 Gig copper SJ – Mar 2003 Gig fiber 10 Gig
Enterasys extension 1Q2003 Disk Servers 13-24 37-48 13-24 25-36 1-12 25-36 1-12 4 4 4 2 2 2 E1 OAS E1 OAS 2 32 2 E1 OAS E1 OAS 4 4 4 4 1-96 FastEthernet 73-84 85-96 49- 60 61-72 Gig copper 32 node Itanium cluster 200+ node Pentium cluster Gig fiber 10 Gig SJ – Mar 2003
Why a 10 GbE Challenge? • Demonstrate LHC-era technology • All necessary components available inside the opencluster • Identify bottlenecks • And see if we can improve • We know that Ethernet is here to stay • 4 years from now 10 Gbit/s should be commonly available • Backbone technology • Cluster interconnect • Possibly also for iSCSI and RDMA traffic We want to advance the state-of-the-art ! SJ – Mar 2003
Demonstration of openlab partnership • Everybody contributes: • Enterasys • 10 Gbit switches • Hewlett-Packard • Server with its PCI-X slots and memory bus • Intel • 10 Gbit NICs plus driver • Processors (i.e. code optimization) • CERN • Linux kernel expertise • Network expertise • Project management • IA32 expertise • CPU clusters, disk servers on multi-Gbit infrastructure SJ – Mar 2003
“Can we reach 400 – 600 MB/s throughput?” • Bottlenecks could be: • Linux CPU consumption • Kernel and driver optimization • Number of interrupts; tcp checksum; ip packet handling, etc. • Definitely need TCP offload capabilities • Server hardware • Memory banks and speeds • PCI-X slot and overall speed • Switch • Single transfer throughput • Aim: • identify bottleneck(s) • Measure • peak throughput • Corresponding cost: processor, memory, switch, etc. SJ – Mar 2003
Gridification SJ – Mar 2003
Opencluster - future • Port and validation of EDG 2.0 software • Joint project with CMS • Integrate opencluster alongside EDG testbed • Porting, Verification • Relevant software packages (hundreds of RPMs) • Understand chain of prerequisites • Exploit possibility to leave control node as IA-32 • Interoperability with EDG testbeds and later with LCG-1 • Integration into existing authentication scheme • GRID benchmarks • To be defined later Fact sheet: HP joined openlab mainly because of their interest in Grids SJ – Mar 2003
Opencluster time line Install 32 nodes Start phase 1 - Systems expertise in place Complete phase 1 Start phase 2 Order/Install G-2 upgrades and 32 more nodes Complete phase 2 Start phase 3 openClusterintegration Order/Install G-3 upgrades; Add nodes EDG and LCG interoperability Jan 03 Jan 04 Jan 05 Jan 06 SJ – Mar 2003
Recap:opencluster strategy • Demonstrate promising IT technologies • File system technology to come • Deploy the technologies well beyond the opencluster itself • Focal point for vendor collaborations • Channel for providing information to vendors SJ – Mar 2003
Storage Workshop • Data and Storage Mgmt Workshop (Draft Agenda) • March 17th – 18th 2003 (Sverre Jarp) • Organized by the CERN openlab for Datagrid applications and the LCG • Aim: Understand how to create synergy between our industrial partners and LHC Computing in the area of storage management and data access. • Day 1 (IT Amphitheatre) • Introductory talks: • 09:00 – 09:15 Welcome. (von Rueden) • 09:15 – 09:35 Openlab technical overview (Jarp) • 09:35 – 10:15 Gridifying the LHC Data: Challenges and current shortcomings (Kunszt) • 10:15 – 10:45 Coffee break • The current situation: • 10:45 – 11:15 Physics Data Structures and Access Patterns (Wildish) • 11:15 – 11:35 The Andrew File System Usage in CERN and HEP (Többicke) • 11:35 – 12:05 CASTOR: CERN’s data management system (Durand) • 12:05 – 12:25 IDE Disk Servers: A cost-effective cache for physics data (NN) • 12:25 – 14:00 Lunch • Preparing for the future • 14:00 – 14:30 ALICE Data Challenges: On the way to recording @ 1 GB/s (Divià) • 14:30 – 15:00 Lessons learnt from managing data in the European Data Grid (Kunszt) • 15:00 – 15:30 Could Oracle become a player in the physics data management? (Shiers) • 15:30 – 16:00 CASTOR: possible evolution into the LHC era (Barring) • 16:00 – 16:30 POOL: LHC data Persistency (Duellmann) • 16:30 – 17:00 Coffee break • 17:00 – Discussions and conclusion of day 1 (All) • Day 2 (IT Amphitheatre) • Vendor interventions; One-on-one discussions with CERN SJ – Mar 2003
THANK YOU SJ – Mar 2003