1 / 25

DCS Test campaign (Talk based on presentation to TB in December 2004)

DCS Test campaign (Talk based on presentation to TB in December 2004). Peter Chochula. Purpose of the tests. DCS is requested to provide information on system scale Test of hardware compatibility (e.g. PCI controllers installed in PCI risers), components verification before the mass purchase

natan
Download Presentation

DCS Test campaign (Talk based on presentation to TB in December 2004)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign(Talk based on presentation to TB in December 2004) Peter Chochula

  2. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Purpose of the tests • DCS is requested to provide information on system scale • Test of hardware compatibility (e.g. PCI controllers installed in PCI risers), components verification before the mass purchase • Test procedure for the components delivered by sub-detectors • Test procedures for hardware components before the installation

  3. Peter Chochula, DCS Workshop Geneva, February 28, 2005 The System Scale • Two main activity areas of DCS tests: • Performance and stability tests • Resource consumption (implications on system scale and hardware requirements) • The tests cover also the DCS core computing (database, domain controllers, RAS…), system management and security

  4. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Performance tests with impact on DCS scale planning • PVSS II is a system which can be distributed and/or scattered to many CPUs • Two extreme approaches are possible: • group all processes on one machine • Even if this configuration runs stable for some systems, there could be a problem if peak load occurs • Dedicate one machine per task (LV, HV…) • Surely the computer resources would be wasted • Definition of optimal balance between the performance and the size of the system requires tests with realistic hardware and data

  5. Peter Chochula, DCS Workshop Geneva, February 28, 2005 • Who is doing what? • ACC follows tests performed by other groups and provides feedback to sub-detectors • ACC performs test which complement the work of JCOP and other groups. This includes: • Tests not performed by external groups • Tests for which the external planning is incompatible with our schedule (e.g. OS management) • Alice-specific tests (FERO)

  6. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Tests performed in the frames of the SUP Communication between 130 PVSS systems. The influence of heavy load on PVSS has been studied Alarms absorption, display, cancellation and acknowledgment Data archival Trending performance Influence of heavy network traffic PVSS Performance Tests SUP test hierarchy Data generated by leaf nodes was transported to the top level machines

  7. Peter Chochula, DCS Workshop Geneva, February 28, 2005 (Some) SUP results • 130 systems (with ~5 million DPEs defined) interconnected successfully • Connection of 100 UIs to a project generating 1000 changes/s has been demonstrated • These tests were later repeated by our team in order to understand the remote access mechanism • Performance tests on realistic systems • 40000 DPE/machine equivalent to 5 CAEN crates • 1000 alerts generated on a leaf machine in a burst lasting 0.18 sec and repeated after 1s delay

  8. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Alert absorption by PVSS system • The PVSS system was able to absorb all alarms generated on a leaf node • Display of 5000 came alerts: 26s • Cancellation of 5000 alerts: 45 s • Acknowledgment of 10000 alerts: 2 min 20 s • ETM is implementing a new alarm handling mechanism which includes alarm filtering, summary alarms and will provide higher performance

  9. Peter Chochula, DCS Workshop Geneva, February 28, 2005 • The archiving was fully efficient during these tests (no data loss) • We would like to see the performance with the RDB archive • Trending performance depends on queue settings. Evasive PVSS action (protecting the PVSS system from overloading) can disturb the trend, but data can be recovered from archive once the avalanche is gone. • Alert avalanche is memory hungry (>200B/simple DP) • The ACC participated in additional tests (December 2004), where the network has been flooded • No performance drop in the abovementioned tests has been observed • Report was published (Paul)

  10. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Test setup installed in DCS lab Prototypes of DCS core computers Worker nodes (rental) DCS Test Setup CERN Terminal Server Router Domain Controller Database Server Database Server Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node

  11. Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS core computers • In order to operate the DCS following core computers will be needed: • Windows DC • Application Gateway for remote access • Database server(s) with mass storage DCS infrastructure node (prototype available) • Central operator’s computer Prototypes for all components available. Database servers need further testing

  12. Peter Chochula, DCS Workshop Geneva, February 28, 2005 The DCS Lab Fronted prototype Pre-installation Servers Backend Prototype

  13. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Backend Systems Pre-installation Servers Backend Prototype

  14. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Complementary tests performed by the ACC – Remote access • Tests performed by SUP indicated that a large number of UI can be connected to a running project • no interference observed up to 100 UIs – tests did not go further • Our tests tried to simulate an external access using W2k3 Terminal Services to a heavy loaded system and observe the effects on • Terminal server • Running project Remark: The Terminal Server is technology recommended by CERN’s security. Tests performed by ALICE were presented to JCOP

  15. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Computer Infrastructure for Remote Access Tests Windows XP Pro Windows XP Pro Windows Server 2003 Windows Server 2003 CERN Net. Terminal server Router Remote User Remote User Windows XP Pro Windows XP Pro DCS Private Net. 192.168.39.0 PVSS Master Project Remote User

  16. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Computer loads for large number of remote clients Master project generated 50000 datapoints and updated 3000 /s. Remote client displayed 50 values at a time

  17. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Conclusions on remote access tests • Prototype performed well • Memory consumption ~35 MB per “heavy” session • CPU usage reasonable, one Xeon CPU running at 3GHz can handle the load • Stability tested over weeks

  18. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Additional possible bottlenecks to be tested • SUP tests were focused only on performance of the PVSS system • What is different in real DCS configuration? Please remember the discussion which we had this morning UI OPC c OPC s HW OPC c OPC s HW EM OPC c OPC s HW CM VA Peak load ? Data queuing ?

  19. Peter Chochula, DCS Workshop Geneva, February 28, 2005 OPC tests • Results from CAEN OPC tests performed by HMPID and JCOP made available in December 2004 • Test setup covered the full controls hierarchy • Time to switch 200 channels : 8s • Recommendations: • Maximum number of 4 fully equipped crates (~800 channels) per computer • Tests provided very useful comparison between real hardware and software simulators See Giacinto’s talk for details

  20. Peter Chochula, DCS Workshop Geneva, February 28, 2005 What happens next • We are preparing inventory of the hardware (number of channels, crates, etc.) • data available on DCS page, detectors are regularly requested to update the information • More (not only) performance tests scheduled for early 2005

  21. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Additional activities in the lab • Component compatibility tests • Will the PCI cards work together ? • Are they compatible with the PCI riser cards and rack-mounted chassis ? • How do they perform in a rack ? • Obvious but non-trivial questions • How do we arrange components in the racks? • What about cables, air-flow, … • And of course – preparing for your software

  22. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Component Compatibility Tests 2U PCI Riser PCI Can Controller NI-MXI2 VME Master

  23. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Trying to arrange the components in the racks…

  24. Peter Chochula, DCS Workshop Geneva, February 28, 2005 Test schedule (as presented to the TB) 12/04 2/05 4/05 6/05 1/05 3/05 5/05 ConfDB tests………….. ArchDB tests………………….. Full system configuration…………… OPC stability………………..….. Alarms……………………………………….. ArchDB – connection to IT…………………………….. FERO – SPD prototype……………………. Mixed system environment…... Patch deployment……………………….. Network security tests………………… Additional delay with respect to this planning (~2 months) accumulated due to late delivery of computers, but we need to catch up

  25. Peter Chochula, DCS Workshop Geneva, February 28, 2005 • Input from sub-detectors is essential • Most unexpected problems are typically discovered only during the operation • This experience cannot be obtained in lab • Pre-installation is a very important period for DCS • Efficient tests can be performed only with realistic hardware • Components are missing • We do not have enough manpower to perform tests for all possible combinations of hardware at CERN

More Related