1 / 5

P20 Test Run

Parag Mhashilkar, Fermi National Accelerator Laboratory. P20 Test Run. Overview. Basic Architecture for Reprocessing New features included in Samgrid Job Performance Analysis. Basic Architecture for Reprocessing. OSG Station: osg-ouhep on d0srvo47.fnal.gov Station Caches:

Download Presentation

P20 Test Run

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parag Mhashilkar, Fermi National Accelerator Laboratory P20 Test Run Parag Mhashilkar, Fermilab

  2. Overview • Basic Architecture for Reprocessing • New features included in Samgrid • Job Performance Analysis Parag Mhashilkar, Fermilab

  3. Basic Architecture for Reprocessing OSG Station: osg-ouhep on d0srvo47.fnal.gov Station Caches: ouhep00.nhn.ou.edu, d0srv015.fnal.gov, d0rsam01.fnal.gov Durable Location: ouhep00.nhn.ou.edu, d0srv063.fnal.gov, d0srv065.fnal.gov Flow of Job Submission Offers Services SAM Services Samgrid SAM-Grid / OSG Forwarding Node OSG Job Forwarding: d0srv047.fnal.gov OSG Sites: Fermilab, USCMS Farm, Oklahoma University, Indiana University, University of Nebraska – Lincoln, … Samgrid Client Site: d0mino0x.fnal.gov Parag Mhashilkar, Fermilab

  4. New features included in Samgrid • Enhanced data movement scheme using fcp queues. • Support "affinity" mode in storage negotiator for selecting input/output storage locations close to a given cluster. • Use “sam upload” to store unmerged thumbnails from the worker nodes directly to the durable location. FSS buffer area is not used in this case, thus bypassing one data movement hop. • Use SRM at UNL and SPRACE using SRM enabled SAM services at the forwarding node. Parag Mhashilkar, Fermilab

  5. Job Performance Analysis • As of January 08, 2007 – • Total OSG jobs submitted = 2959 • Jobs Held = 179 (~6%) • Initial success rate = ~78% • Total success rate dropped to ~67% because of recent file transfer errors. • Most of the failures were timeouts transferring either job files or the data files to the worker node. • Some of the jobs failed because of the disk at the forwarding node filling up faster than we anticipated. • Samgrid forwarding node will be moved to d0srv066.fnal.gov. This machine has more local disk 2TB allocated to Samgrid installation and log files. This machine is being set up. Parag Mhashilkar, Fermilab

More Related