70 likes | 209 Views
Di Qing SA3 Academia Sinica & CERN Geneva, 2008. CREAM CE Certification and Testing. Introduction. Goals Verify installation and configuration Pass normal certification procedures 5 days unattended continuously stress test 50 multiple users Less than 0.5% failures Patches
E N D
Di Qing SA3 Academia Sinica & CERN Geneva, 2008 CREAM CE Certification and Testing
CREAM CE certification and testing - Qing Introduction • Goals • Verify installation and configuration • Pass normal certification procedures • 5 days unattended continuously stress test • 50 multiple users • Less than 0.5% failures • Patches • 1755, CREAM server • 1790, CREAM client • Test started at the end of May • The test scripts provided by INFN • One wiki page setup for test results • https://twiki.cern.ch/twiki/bin/view/EGEE/CREAMTest • CE checklist • https://twiki.cern.ch/twiki/bin/view/EGEE/CECheckList
CREAM CE certification and testing - Qing Testbed setup • One separated Torque server • BLParser server installed there by hand • 11 WNs • 110 Virtual CPUs • One UI • One physical CREAM CE for stress testing • 4 2.2 GHz cores • 4GB memory • One CREAM CE for installation and configuration test
CREAM CE certification and testing - Qing Test performed • Installation and configuration • Followed the formal installation procedure, specially check the package dependency • Tested different installation scenario • Configured it by YAIM • Basic functionalities • Submission through CLI, job status check, delegate proxy etc. • Stress testing • Submission through CLI • 9800 jobs per day with 49 users • Jobs accumulated in queues as fast as possible
CREAM CE certification and testing - Qing Test results • Basic functionality tests passed • Dependency missing on some packages • tomcat, mysql-server and mysql-connector-java • Configuration issues • Configurations of BLPaser server and blah • LCAS and LCMAPS generate too many logs • Issue with the upgrade of Glexec • fixed • Other issues • Need to restart tomcat if CAs are updated • problems in trustmanager • too many files left under the home directories of pool accounts • Authentication fails with new type of VO attributes in VOMS proxy • fixed
CREAM CE certification and testing - Qing Stress test results • System load • CPU load is quite low, even less than 1 for most of time, only when submitting massive jobs to cream CE, it can reach 9 • Memory usage is low too, less than 2GB • Disk usage can increase • Can be solved by purging jobs and limiting the log level of services • Job submission • Job submission to CREAM CE can fail • Happened in last two days test, even more than 3% of jobs could not be submitted • Job success rate • More than 99.5% jobs succeeded in 6 of 10 days test • The worst failure rate is about 9.5% • Most of failures only give error message, “blah error”
CREAM CE certification and testing - Qing Conclusion • Still need more works to reach production quality • We never reached 5 days unattended continuously stress test with less than 0.5% failure rate on certification testbed • Now can be released to PPS for users to test and get some experiences after sorting out the installation and configuration issues • In principle, it can be done today • The tests have been done only by CREAM CLI • Tests through WMS will be done when ICE is ready • A recipe on how to setup a WMS plus ICE is available • A new CREAM CE patch is in preparation for certification