50 likes | 71 Views
Parag Mhashilkar, Fermi National Accelerator Laboratory. P20 Test Run. Overview. Basic Architecture for Reprocessing New features included in Samgrid Job Performance Analysis. Basic Architecture for Reprocessing. OSG Station: osg-ouhep on d0srvo47.fnal.gov Station Caches:
E N D
Parag Mhashilkar, Fermi National Accelerator Laboratory P20 Test Run Parag Mhashilkar, Fermilab
Overview • Basic Architecture for Reprocessing • New features included in Samgrid • Job Performance Analysis Parag Mhashilkar, Fermilab
Basic Architecture for Reprocessing OSG Station: osg-ouhep on d0srvo47.fnal.gov Station Caches: ouhep00.nhn.ou.edu, d0srv015.fnal.gov, d0rsam01.fnal.gov Durable Location: ouhep00.nhn.ou.edu, d0srv063.fnal.gov, d0srv065.fnal.gov Flow of Job Submission Offers Services SAM Services Samgrid SAM-Grid / OSG Forwarding Node OSG Job Forwarding: d0srv047.fnal.gov OSG Sites: Fermilab, USCMS Farm, Oklahoma University, Indiana University, University of Nebraska – Lincoln, … Samgrid Client Site: d0mino0x.fnal.gov Parag Mhashilkar, Fermilab
New features included in Samgrid • Enhanced data movement scheme using fcp queues. • Support "affinity" mode in storage negotiator for selecting input/output storage locations close to a given cluster. • Use “sam upload” to store unmerged thumbnails from the worker nodes directly to the durable location. FSS buffer area is not used in this case, thus bypassing one data movement hop. • Use SRM at UNL and SPRACE using SRM enabled SAM services at the forwarding node. Parag Mhashilkar, Fermilab
Job Performance Analysis • As of January 08, 2007 – • Total OSG jobs submitted = 2959 • Jobs Held = 179 (~6%) • Initial success rate = ~78% • Total success rate dropped to ~67% because of recent file transfer errors. • Most of the failures were timeouts transferring either job files or the data files to the worker node. • Some of the jobs failed because of the disk at the forwarding node filling up faster than we anticipated. • Samgrid forwarding node will be moved to d0srv066.fnal.gov. This machine has more local disk 2TB allocated to Samgrid installation and log files. This machine is being set up. Parag Mhashilkar, Fermilab