340 likes | 436 Views
PROOF on Condor and Dataset Based File Redistribution. Mengmeng Chen, Annabelle Leung, Miron Livny, Bruce Mellado, Neng Xu and Sau Lan Wu University of Wisconsin-Madison PROOF WORKSHOP 2007. Special thanks to Gerardo Ganis, Jan Iwaszkiewicz and Fons Rademakers(PROOF)
E N D
PROOF on Condor and Dataset Based File Redistribution Mengmeng Chen, Annabelle Leung, Miron Livny, Bruce Mellado, Neng Xu and Sau Lan Wu University of Wisconsin-Madison PROOF WORKSHOP 2007 Special thanks to Gerardo Ganis, Jan Iwaszkiewicz and Fons Rademakers(PROOF) Andy Hanushevsky, Fabrizio Furano(Xrootd), Condor Team. X D Neng Xu, University of Wisconsin-Madison
Why PROOF and Computing On Demand(COD)? • COD can immediately get the CPUs.(less than 2 seconds) • The running jobs will be SUSPENDED and resumed after COD session finished. • The PROOF system can easily be integrated into the ATLAS Production system. • Good authentication protection with COD.- User ID- Kerberos- GSI authentication Neng Xu, University of Wisconsin-Madison
The Old PROOF+COD Model(Developed in 2003 by PROOF and Condor) • Xrootd was not used for storage at that time. • User submits COD requests to claim the machines. • User needs to start the PROOF service on those claimed machines. • User submits the PROOF job. • Jobs read data from central storage area. (NFS, Dcache, Castor) Neng Xu, University of Wisconsin-Madison
The Old PROOF+COD Model(Developed in 2003) Condor + Pool Normal Production Condor jobs Condor Master COD requests Heavy I/O Load PROOF requests Storage PoolCentralized storage servers (NFS, Xrootd, Dcache, CASTOR) PROOF jobs PROOF Master Neng Xu, University of Wisconsin-Madison
The Disadvantages of The Old Model • User has to decide how many machines he needs. • Starting PROOF service on those claimed machines is complicated. • User has to release those machines after the PROOF job finished. • Reading data from central storage area causes network traffic and is not very efficient. • COD doesn’t affect the Condor priority system. • No central control of the PROOF jobs. The number of COD requests and PROOF jobs can go crazy if there are too many users. Neng Xu, University of Wisconsin-Madison
The Newer PROOF+COD Model Condor Pool Normal Production Condor jobs Condor Master COD requests Heavy I/O Load PROOF requests Storage PoolCentralized storage servers (NFS, Xrootd, Dcache, CASTOR) PROOF jobs PROOF/Xrootd Redirector Neng Xu, University of Wisconsin-Madison
The Newer PROOF+COD Model Pros: • The number of COD requests and PROOF jobs can be controlled by the PROOF Redirector • User doesn’t need to decide how many machines to use. • User doesn’t need to release those machines after the PROOF job finished. Cons: • COD claims the machines in sequence. • Reading data from central storage area causes network traffic and it is not very efficient. • PROOF was not working with xrootd yet at that time. • COD doesn’t affect the Condor priority system.
The Developing PROOF+COD Model Condor + Xrootd + PROOF pool Normal Production Condor jobs Condor Master COD requests PROOF requests The local storage on each machine PROOF jobs PROOF Master Neng Xu, University of Wisconsin-Madison
The Developing PROOF+COD Model • User only submits PROOF jobs in a normal way. • PROOF redirector decides how many COD requests to send for each PROOF job. • PROOF redirector also decides which nodes(slots) the PROOF job should go because it knows the location of the files. • PROOF redirector also considers the status of the whole condor pool - it will try to use free machines first. • PROOF redirector can also take the Condor priority system into consideration. (Hopefully, we can make COD affect Condor priority system.) Neng Xu, University of Wisconsin-Madison
The model we(Wisconsin Group) are implementing Job priority GRID PANDA The gatekeeper Production jobs from Grid and submits to local pool. low Condor Pool +PROOF/Xrootd Poolcpu + storage Computing nodesno storage Condor Master Local jobs Users’ own jobs to the whole pool. XROOTD/PROOF Master Less I/O Load Storage Pool(Mainly for backups) high Proof jobs Neng Xu, University of Wisconsin-Madison
Multi-layer Condor System Job priority high PROOF COD QueueFor PROOF Jobs, Cover all the CPUs, no effect to the condor queue, jobs get the CPU immediately. Proof jobs Users’ PROOF jobs to the Xrootd pool. Fast QueueFor high priority private jobs, No number limitation, run time limitation, cover all the CPUs, with high priority I/O QueueFor I/O intensive jobs, No number limitation, No run time limitation, Cover the CPUs in Xrootd pool, Higher priority Local job Submission Users’ own jobs to the whole pool. Local Job QueueFor Private Jobs, No number limitation, No run time limitation, Cover all the CPUs, Higher priority. The gatekeeper Takes the production jobs from Grid and submits to local pool. Production QueueNo Pre-emption, Cover all the CPUs, Maximum 3 days, No number limitation. low Neng Xu, University of Wisconsin-Madison
Why we like this new model • Users don’t need to even think about COD. • Very easy for user to run PROOF jobs. • Redirector controls the COD requests and PROOF requests centrally so that it can do the scheduling better. • With the Xrootd technology, the network traffic can be really light. • With COD authentication control, the PROOF pool can be easily “partitioned” to different groups. Neng Xu, University of Wisconsin-Madison
Dataset Based File Redistribution(DBFR) Neng Xu, University of Wisconsin-Madison
Why do we need data redistribution? This one is down. • Case 1: One of the data servers is dead. All the data on it is lost. Replace it with a new data server. New machineto replace the bad one. • Case 2: When we extend the Xrootd pool, we add new data servers into the pool. Old machines When new data comes, most of the new data will go the new servers because of the load balancing function of Xrootd. The problem is that if we run PROOF jobs on the new data, all the PROOF jobs will read from those new servers. New machines Neng Xu, University of Wisconsin-Madison
Xrootd Level File Redistribution • Xrootd has good load balancing but not dataset based. • Xrootd can do the file redistribution. but • Heavy load on the data servers. • No optimization for PROOF. Neng Xu, University of Wisconsin-Madison
An example of Xrootd file distribution WHY? All the files were copied through Xrootd redirector. Number of files This machine was down The machines Neng Xu, University of Wisconsin-Madison
The PROOF Performance On This Dataset Here is the problem Neng Xu, University of Wisconsin-Madison
The Explanation Processing rate for a packet (#events/sec) Number of WorkersAccessing Files Only few workers active Neng Xu, University of Wisconsin-Madison Running Time
After File Redistribution Number of files The machines Neng Xu, University of Wisconsin-Madison
The Performance after Redistibution Neng Xu, University of Wisconsin-Madison
The Explanation Processing rate for a packet (#events/sec) Number of WorkersAccessing Files Problem is gone Running Time Neng Xu, University of Wisconsin-Madison
Before and after Number of WorkersAccessing Files Before File Redistribution Number of WorkersAccessing Files After File Redistribution Neng Xu, University of Wisconsin-Madison Running Time
The Basic Idea of DBFR • Register the location of all the files in every datasets in the database(MySQL). • With this information, we can easily get the file distribution of each dataset. • Calculate the average number of the files each data server should handle. • Get a list of files which need to move out. • Get a list of machines which have less files than the average. • Match these 2 lists and move the files. • Register the new location of those files. Neng Xu, University of Wisconsin-Madison
When should we do Dataset Based File Redistribution • After creating the datasets, we should do the redistribution. • In Case 1, before let people start to use the new server, one should force to do the redistribution on all the datasets. • In Case 2, before let people start to use the new servers, one should force to do the redistribution on all the datasets. Neng Xu, University of Wisconsin-Madison
What should be taken into consideration • The number of CPU cores on each data server. • The disk space on the data servers. • The disk I/O speed. • The memory size. • The size of the files in the datasets. • The usage of the whole xrootd pool. • The priority of the datasets. Neng Xu, University of Wisconsin-Madison
The Implementation of DBFR xrootd Poolcpus cores + storage. Mysql database server • We are working on a MySQL+Python based system. • Only implement with basic function. • Hopefully, this system can be implemented at PROOF level because PROOF already work with datasets. 0 1 2 3 4 xrootd redirector Dataset manager Neng Xu, University of Wisconsin-Madison
Summary • The file distribution is optimized for PROOF. • Help the scalability of xrootd pool. Easy to extend the pool. • Low system load.- The file moving decision is based on the database.- The file moving can be done based on the load of the whole pool.- The redistribution can be scheduled by the usage of dataset. • We are trying to integrate this system into the LRC(Local Replica Catalog) database because LRC already associated files to datasets. • Not only good for PROOF, also good for Production jobs. • Also we can consider dataset level “RAID” which can have kind of file redundancy.
Additional slides Neng Xu, University of Wisconsin-Madison
The principle of the I/O Queue Mysql database server 0 Xrootd Poolcpus cores + big disks. 1 2 4 3 Submitting node Condor master 5 0. The cronjob provide all the file location in the Xrood pool. 1. Submission node ask Mysql database for the input file location. 2. Database provide the location for file and also the validation info of the file. 3. Submission node add the location to the job requirement and submit to the condor system. 4. Condor sends the job to the node where the input file stored. 5. The node runs the job and put the output file also to the local disk. 0. The cronjob provide all the file location in the Xrood pool.
Types of jobs • Direct Access • jobs go on machines where the input files reside • accesses ESD files directly and converts them to CBNTAA files • copies output file to xrootd on the same machine using xrdcp • each file has 250 events? • xrdcp • jobs go on any machines – not necessarily on the ones which have the input files • copies input and output files via xrdcp to/from the xrootd pool • converts the input ESD file to CBNTAA • cp_nfs • jobs go on any machines • copies input and output files to/from NFS • converts the input ESD file to CBNTAA Neng Xu, University of Wisconsin-Madison
Test Configuration • Input file (ESD files) size ~700MB • Output File (CBNTAA) size ~35MB • Each machine has ~10 ESD files • 42 running nodes • 168 CPUs cores
Test Results Time save per job: ~230sec Number of jobs
Test Results Number of jobs
Advantages of the I/O queue • Reduce large amount of data transfer, especially via network. Even the local file transfer can be ignored. Input/Output files can be directly accessed from the local storage disk. • During the job submission, users can see what files are unavailable and the job won’t be submitted if it can not find input file. This can reduce the waste of CPU cycles. • Better use of the Xrootd+PROOF pool. (Before, PROOF pool only store the analyzable data, like CBNT, AOD, PDP. Now it can also store RDO, ESD, raw data for production.) • User can use dq2-like commands to submit jobs. • Better usage of CPU resources. CPU cycles won’t be wasted on waiting file transfer. • No big change for the job submission. (User doesn’t needworry about the database thing.)