1 / 30

Report about HEPIX Roma April 3-7

Report about HEPIX Roma April 3-7. SFT Group Meeting May 5, 2006 Ren é Brun CERN. http://hepix.caspur.it/spring2006/. HEPIX Spring 2006 in Rome.

esteele
Download Presentation

Report about HEPIX Roma April 3-7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Report about HEPIXRoma April 3-7 SFT Group Meeting May 5, 2006 René Brun CERN http://hepix.caspur.it/spring2006/

  2. HEPIX Spring 2006 in Rome • The meeting was held in the Italian National Research Council (CNR) , a very comfortable auditorium although the networking should have been more stable than it was all week. Initially there were hardware problems but by mid-week it was the presence of locally broadcasting nodes in the room which created much instability and which could not be traced. • Alongside the traditional HEPiX sessions, there were a number of special meetings such as the LCG2 GDB3, the OPN4 working group and others so the total registration count was over 120 although not all were present all week. • Unlike previous meetings, this one was mostly separated into topics with a convener appointed for each topic. Also, as in the past two HEPiX meetings in Europe, this meeting attracted a noticeable number of representatives of LCG Tier 2 sites, from across Europe especially. Report about HEPIX Spring 2006

  3. Report about HEPIX Spring 2006

  4. Highlights • Computer room cooling and air conditioning systems were mentioned in a majority of site reports. Several sites are having to build or equip new computer rooms to get round capacity restrictions in existing facilities. • As usual at recent HEPiX meetings, there were a number of benchmarks presented with very detailed overheads well worth a look if you are interested in performance or costs. ・ • New format for HEPiX with half-day sessions on dedicated topics; such as networking, performance optimization and databases were new to HEPiX, with corresponding invited speakers. • Collaboration on and re-use of HEP-developed tools was not particularly emphasized . On the other hand, there were, as often the case, a few examples of wheels being re-invented for no obvious reason. • Also some random tools CERN/IT might want to look at :Imperia for web page content management (PSI site report); Subversion, mentioned several times by DES Group as a possible replacement for CVS for code management, seems to have arrived on at least a couple of HEP sites. • Virtualisation, Virtualisation, Virtualisation • What to do about Bird flu by Bob Cowles( security talk) Report about HEPIX Spring 2006

  5. Site Reports • TRIUMF, CASPUR, RAL, CERN,DESY, FZK, CNAF, JLAB, LAL, NIKHEF, PSI, RZG, SLAC, BNL • Nearly all sites installing thousands of Opteron machines. Report about HEPIX Spring 2006

  6. Plenary talks • LCG status by Les • CPU technologies (Bernd Panzer) • Power consumption issues (Yannick Perret IN2P3) • Dual-Core batch Nodes (Manfred Alef FZK) • Benchmarking AMD64 and EMT64 (Ian Fisk) • Networking technologies Report about HEPIX Spring 2006

  7. INTEL and AMD roadmaps • INTEL has moved now to 65 nm fabrication • new micro-architecture based on mobile processor development, Merom design (Israel) • Woodcrest (Q3) claims + 80% performance compared with 2.8 GHz while • 35% power decrease • some focus on SSE improvements (included in the 80%) • AMD will move to 65nm fabrication only next year • focus on virtualization and security integration • need to catch up in the mobile processor area • currently AMD processors are about 25% more power efficient • INTEL and AMD offer a wide and large variety of processor types • hard to keep track with new code names Report about HEPIX Spring 2006

  8. Multi core developments • dual core dual CPU available right now • quad core dual CPU expected in the beginning of 2007 • 8-core CPU systems are under development , but not expected • to come into market before 2009 • (http://www.multicore-association.org/) • cope with change in programming paradigm, multi-threading, parallel • Heterogeneous and dedicated multi-core systems • Cell processor system PowerPC + 8 DSP cores • Vega 2 from Azul Systems 24/48 cores for Java and .Net • CSX600 from ClearSpeed (PCI-X, 96 cores, 25 Gflops, 10W) Rumor : AMD is in negotiations with ClearSpeed to use their processor board  revival of the co-processor !? Report about HEPIX Spring 2006

  9. Game machines • Microsoft Xbox 360 (available, ~450 CHF) • PowerPC based, 3 cores (3.2 GHz each), 2 hardware threads per core • 512 MB memory • peak performance = ~ 1000 GFLOPS • Sony Playstation 3 (Nov 2006) • Cell processor, PowerPC + 8 DSP cores • 512 MB memory • peak performance = ~ 1800 GFLOPS • problem for High Energy physics : • Linux on Xbox • Focus is on floating point calculations, graphics manipulation • Limited memory, no upgrades possible • INTEL P4 3.0 GHz = ~ 12 GFLOPS ATI X1800XT graphics card = ~ 120 GFLOPS • use the GPU as a co-processor, 32 node cluster at Stony Brook • CPU for task parallelism GPU for data parallelism • compiler exists , quite some code already ported • www.gpgpu.org Report about HEPIX Spring 2006

  10. Market trends • The market share of AMD + INTEL in the desktop PC, notebook PC and server are is about 98 % (21% + 77%) • On the desktop the relative share is INTEL = 18% , AMD = 82% (this is the inverse ratio of their respective total revenues) • In the notebook area INTEL leads with 63% • The market share in the server market is growing for AMD, 14% currently Largest growth capacity is in the notebook (mobile) market Report about HEPIX Spring 2006

  11. Report about HEPIX Spring 2006

  12. Report about HEPIX Spring 2006

  13. Report about HEPIX Spring 2006

  14. Report about HEPIX Spring 2006

  15. Batch Systems • ATLAS (Laura Perini) • CMS (Stefano Belforte) • LHCb (Andrei Tsaregorotsev) • ALICE (Federico Carminati) Report about HEPIX Spring 2006

  16. Databases (convener Dirk) • Introduction: Dirk described how LCG databases are kept up to date via asynchronously replication via Streams. He compared the concerns of local and central site managers and how these must be reconciled to provide an overall reliable service. • Database Service for Physics at CERN (Luca Canali) • Database Deployment at CNAF (Barbara Martelli) • Database Deployment at RAL (Gordon Brown) Report about HEPIX Spring 2006

  17. Optimisation and Bottlenecks (Convener Wojciech Wocjik) • Performance and Bottleneck Analysis (Sverre Jarp) this is work done in the framework of CERN openlab collaboration with industry. One of the first choices to make is which compiler gets the best performance from your chip; then which compiler parameters have which effect? Having explained the methodology and emphasized the importance of selecting good tools, knowing the chip architecture and how your algorithm maps to this, he then presented some results obtained from the openlab collaboration with Intel. • Code/Compiler Problems (Rene Brun) threading and the importance of making programmes thread-safe in order to take full advantage of multi-core chips. • Controlling Bottlenecks with BQS (Julien Deveny) • Optimising dCache and the DPM (Greg Cowan). Each Tier 2 site has unique policies and constraints. This leads to various combinations of middleware components. The University of Edinburgh chose dCache and LCG DPM (Disc Pool Manager). Using XFS in the DPM tests showednoticeably better performance but not on the dCache tests. Report about HEPIX Spring 2006

  18. Storage day (1) • Tape Technology (Don Petravick) At Fermi, tape capacity doubles every 18-24 months, LTO-3 drives currently store 400GB but there is no inherent tape density limit as there is for disc technology. In summary he claims tape offers high quality retention technology and simple, reliable units of expansion but it does complicate Hierarchical Storage Management data handling and it requires specialised skills to manage and operate. And the future roadmap appears to face no fundamental engineering limitations. • Disc Technology (Martin Gasthuber) He presented various disc configurations such as FC36 SAN37, SCSI FC and others. Important components are not only the discs themselves but also the interconnects and the disc and network controllers. Expected performance is 40MB/sec throughput per TB of storage. He listed issues to consider when acquiring discs. Discs are getting just too slow and price per GB is flattening out. He offered with some predictions no further increase in FC use but rather Serial Attached SCSI (SAS) which will come with smaller form factors; SATA38 will be around for a while but there will be no real improvement in performance. He ended by describing Object Storage Devices (OSD) which he believes will come in the coming years storage in a box and offering multiple protocols. Report about HEPIX Spring 2006

  19. Storage day (2) • Hardware Potpourri (Andrei Maslennikov) Andrei described what he called a fat disc server contender. He compared what CERN requires for CASTOR performance with what his configuration can achieve and he believes it could satisfy the needs for CASTOR for a cheaper price. • GPFS and StoRM (Luca dell'Agnello) • Local File Systems (Peter Kelemen) Comparison of XFS and ext3. • AFS/OSD Project (Ludovico Giammarino) this is being developed in CASPUR in conjunction with CERN and FZK. The principle goal is to improve AFS performance and scalability. • WAN Access to a Distributed File System (Hartmut Reuter). • Disk to Tape Migration Introduction (Michael Ernst) • CASTOR 2 (Sebastien Ponce) a quick overview of CASTOR 2 and how it has changed from version 1. • dCache (Patrick Fuhrmann) • HPSS (Andrei Moskalenko) Report about HEPIX Spring 2006

  20. Virtual Servers for Windows (Alberto Pace) • Alberto started with a demo of creating a couple of virtual systems on his desktop (one Windows, one Linux using SLC) and while they were being created, he started the presentation with a history of how virtual computers have long been a dream of computer scientists. • As the Intel X86 architecture is becoming by far the most commonly-found system in our environments, running virtual X86 systems on real X86 systems is more attractive than previous implementations of virtual computers. • In CERN there is an ever-increasing number of requests for dedicated servers running individual applications or services. But limitations of space, management overhead and the often-underused CPU load on many of these servers makes virtualisation an interesting option. • The CERN team has built a number of different configurations of Windows 2003-based servers and Linux (both SLC3 and SLC4) virtual systems which can be called up on demand. The scheme uses the Microsoft Virtual Hosting Server. The user can configure the hardware down to the size of memory, the presence of a floppy or CD/DVD, the number of discs, etc. He or she can request use of the server for a finite time or long-term and more options will be offered in the future. Report about HEPIX Spring 2006

  21. Why Virtual servers • More and more requests for dedicated servers in the CERN computer centre • Excellent network connectivity, to the internet and to the CERN backbone (10 Gbit/s) • Uninterruptible power supply • 24x365 monitoring with operator presence • Daily backup with fast tape drives • Hardware maintenance, transparent for the “customer” • Operating system maintenance, patches, security scans • “customer” focus only on “his application”. • Customer not willing to share his server with others, but ready to pay lot of $$, €€, CHF • Frame for this server hosting service: • http://cern.ch/Win/Help/?kbid=251010 Report about HEPIX Spring 2006

  22. However, after an inside look … • Installing and maintaining custom servers is time consuming … • Lot of management overhead • Space in the computer centre is a scarce resource • Several of these servers are underused • Hardly more than 2-3 % CPU usage • Excellent candidate for virtualization Report about HEPIX Spring 2006

  23. Goal of virtualization • Clear separation of hardware management from Server (Software) management • Could be even be made by independent teams • Hardware management • Ensure enough server hardware is globally available to satisfy the global CPU + Storage demand • Manages a large pool of identical machines • Hardware maintenance • Server (Software) management • Manages server configuration • Allocates server images to machines in the pool • Plenty of optimization possible • Automatic reallocation to different HW according to past performances • Little overhead • Emulation of PC on real PC is very efficient Report about HEPIX Spring 2006

  24. Server on Demand • Chose from a set of “predefined” images • Windows server 2003 • Windows Server 2003 + IIS + Soap + Streaming • Windows Server 2003 + Terminal Server Services • … • Scientific Linux CERN 3 or 4 • … • Takes resources from the pool of available HW • Multiple, different, OS can be hosted in the same box • Available within 10 minutes • Before: between one week and one months • Cost: much cheaper, especially manpower • Performances: unnoticeable difference Report about HEPIX Spring 2006

  25. What’s next ? • We can expect request for more “Server types” • Various combinations of OS and applications • We can expect request for custom server types • User creates and manages his server images • Future server on demand • “I need 20 servers with this image for one month” • “I need an image for this server replicated 10 times” • “I need more CPU / Memory for my server” • “I do not need my server for 2 months, give me an image I can reuse later” • “I need a test environment, OS version n+1, to which I can migrate my current production services” • I need 10 Macintosh instances … • … Report about HEPIX Spring 2006

  26. Conclusion • Server virtualization a strategic direction for (windows) server management at CERN • HW and SW management can be independent • We can expect consequences also for traditional batch systems • Instead of allocating CPU time for jobs submitted for a rigid OS configuration one could allocate bare “virtual PC time” • User would submit “PC image hosting the job”. Farm independent of OS, less security implication (for the farm management), unprecedented flexibility for users Report about HEPIX Spring 2006

  27. Scientific Linux • Status and Plans (Troy Dawson) Current usage of SL is at least 16,000 installations (total of SL3 and SL4). Fermilab itself is standardising on SLF 4.2 and trying to phase out all the unsupported distributions (those before SL3). They are gearing up for SL5, although they are bound by Redhat release date for RHEL485 and they realise it will not arrive in time to be packaged and deployed before LHC startup. He asked if there is a need long-term for Itanium releases or any other architecture; the answer, at least from this audience, was no. • SLC (Jarek Polok) 2100 individual SLC3 installations, 3559 centrally-managed installations and 2400 SLC3 installations outside CERN. SLC 4.3 is just coming into use after its official release at the beginning of April. As explained above, the projected release date of RHEL 5 (only next year) means that SLC4 will be the officially-supported release for LHC startup. It is planned to start migrating to it on the central clusters in September this year. Report about HEPIX Spring 2006

  28. Security (Bob Cowles) • Bob covered a range of topics, starting with the dangers and risks of Skype, especially of becoming a Supernode when connected to a powerful network; apparently this does not happen to systems behind NAT49 boxes. • Skype is banned at CERN and monitored at SLAC. Turning to topical matters, service providers should be concerned about the risks of a bird flu epidemic if people start seriously to get infected and have to stay home, how to run the operation; what happens if they use infected home PCs to login? • He displayed the list of some 30 passwords he had sniffed during the week from among the HEPiX attendees. • He listed 10 tips to improve security (see overheads). Report about HEPIX Spring 2006

  29. POP3 kastela3, Romania2, ecdMJee4dD, baum2kid, ghbghb, 1@roma06, ubc789, 84relax, 4q63wbg, light2484, tDsfCxJs IMAP Dadoes63, cal1pat0 dnow12i, Bruck5BD * hoFK87, 1etsg0, 21 filipch ckmckmir, obheyto, authum1808 R2gsumb0, rugbybear v3sm9r-EGEE, k7u0na Dad123Red345, 123456 Tuesday, ippin, nk0 SMTP lworib4u, iosara44, tuesday, ha66il33 ICQ gg147231, lalamisi xircom12, power0 123stell, B7A8 FTP !! Passwords Report about HEPIX Spring 2006

  30. Next Meeting • Next meeting at Jefferson Lab, 9th October, • followed by DESY in Spring 2007. Report about HEPIX Spring 2006

More Related