200 likes | 285 Views
Development of test suites for the certification of EGEE-II Grid middleware. Task 2: The development of testing procedures focused on special details of various software features Task 4: Creating the specialized testbed for developing test suites Task 5:
E N D
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of various software features Task 4: Creating the specialized testbed for developing test suites Task 5: Preparing intermediate and final reports PNPI – Yu. Ryabov, N. Klopov
Plans for the second year • Development of the stress and performance tests for WMS and CE according with requests from developers and/or certification team 2. New gLite 3.1 middleware installation on the testbed
Requirements to the test • Submit a large number of jobs simultaneously • Submit jobs from one or many users. • Monitoring of a load of CE and WMS during testing. • Monitoring of jobs status (pass through system’s components) on the CE and WMS during testing. • Storing of status information for all submitted jobs. • Possibility of express visual analysis of results.
Functional schema of the test Monitoring Monitoring CE WMS Monitoring data Jobs logging info Jobs logging info UI ….. Data collector Parametric job Parametric job Job submitter
Jobs submission Job submission program (several scripts) has the following input parameters: • u- the number of the users • x- path to the directory with users proxy certificates (x1- path to the user proxy certificate) • n -the number of the subjobs from each user • s- time interval between jobs status request • t -max time of the test execution • a- the time of a subjob will execute on WN • l- path to the logfile
Monitoring These scripts run on CE and WMS and provide receiving and saving information about load average and system processes names. The script runs with the following parameters: • t - pool time • l -request for load average • p -request for process names Load average ~The quantity of active processes (from UNIX)
Data collector The Data collector script is executed after finish of all jobs and does the following: • copy monitoring data from WMS and CE; • request the event time information for each subjob, using glite-wms-job-logging-info command; • preliminary data processing (formatting);
Parametric job Parametric job functionality was used to solve the problem of simultaneous submission of large number of jobs to CE. Parametric job is a set of jobs (subjobs) with the same descriptions apart from the values of the parametric attributes. JobType = "Parametric"; Executable = "tst.sh"; InputSandbox = {“tst.sh", "input_PARAM_.txt"}; StdOutput = "out_PARAM_.txt"; StdError="err_PARAM_.txt"; OutputSandbox={"out_PARAM_.txt", "err_PARAM_.txt"}; Parameters=1000; ParameterStart=0; ParameterStep=1; Parametric attributes get values from 0 to 999. WMS will create individual subjob for each parameter value. N=(Parameters-ParameterStart)/ParameterStep subjobs will be created Both main parametric job and its subjobs will have unique IDs.
Testbed gLite 3.1 middleware was installed on the testbed: WN WMS+LB+ BDII CE WN WN UI WN
Test usage Measurement of “load average” as function of time under the following condition: N jobs from each of K users Test usage in PNPI: 1000 jobs from each user (1 user, 3 users) for “old” and “new” versions gLite; Old - we had been using till January 2008 New (with marshal patches) - we have been using since January 2008 New version with marshal patches was released to production 10 April 2008 (gLite update 23) Marshal patches was developed by A.Kiryanov (PNPI)
Marshal patches for LCG CE • Aim is to improve behavior of LCG CEs under load by regulating requests from job managers (hence the term ‘marshal’)due to : • Eliminate the necessity to recompile heavy Perl code on every job manager invocation • Memory-persistent daemons handle the requests • Control of the number of simultaneously running job manager queries • Decreases load on file system and batch system • Prevent CE overload by WMSes • Decrease system’s losses • Jobs complete faster, especially visible with large number of short jobs
Express visual analysis(WEB viewer) • Each job passes through the different WMS components (the corresponding events are generated and stored in LB. Example of these events: “RegJob,NetworkServer”, • “Match,WorkloadManager”,…,”Done, LogMonitor”). • It gives the possibility to evaluate the performance of the WMS components. • The WEB viewer was developed to provide the visual representation of • events timestamp for the jobs running through the different components. • This viewer provide the following functions: • - to choose the event type which will be sorted by the timestamp value; • to choose data file with logging info data; • to get the graph of the event time since job registration in WMS for each job; • to choose the additional event type (will be represented on the same graph); • - to get and store graph data as text file for the future analysis; • - to get ID and logging info data for the subjobs those lost the chosen events; • - to view monitoring data.
Express visual analysis Transfer (source- Logmonitor destination- LRMS) Accepted (source Logmonitor)
Express visual analysis We can view the monitoring data
Summary • The testbed was created with the gLite 3.1 • A complex test was developed which provide the following: • Submission of the large number of jobs from many users • Load average monitoring on WMS and CE • Data acquisition of the test results • Developed test has been used on concrete sets of input parameters • HTML viewer was developed for the presentation of test results
Summary (First year of the grant) Set of WMS tests (control of functionality) was developed according to the request from gLite certification team for the following types of jobs: parametric, interactive, checkpointable, partitionable. Long and complex JDL stress test (for estimation of critical size of file) Some of the tests were included into certification SAM framework. 5 bugs were found and submitted in Savannah.
Conclusion • Task 2 (PNPI)- done • Task 4 (PNPI)-done • Task 5 –under preparation (together with collaborating teams)