1 / 10

AMOD Report

AMOD Report . Doug Benjamin Duke University. Hourly Jobs Running during last week. 140 K. 0 . Blue – MC simulation Yellow Data processing Red – user Analysis Magenta – group production Grey – group Analysis. DDM data flows during last week. 800 TB. 10 TB. 0 TB. Notable activities.

brian
Download Presentation

AMOD Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMOD Report Doug Benjamin Duke University

  2. Hourly Jobs Running during last week 140 K 0 Blue – MC simulation Yellow Data processing Red – user Analysis Magenta – group production Grey – group Analysis

  3. DDM data flows during last week 800 TB 10 TB 0 TB

  4. Notable activities • Monday - Recover from slow T0 Export over the weekend to RAL and Triumf • Both switched over to backup OPN over the weekend Cause never understood • Triumf slower link and RAL Asymmetric link • Tuesday – SARA T0 export and T1 stage from tape issues • Wednesday -RAL unplanned power cut , CERN LSF job submission slowness • Thursday – RAL power restored – recover outage , continue with CERN LFS job submission slowness • Friday - CERN LFS job submission slowness • Saturday – Rain lots of it (flooding, R1, my office building, SPS – took beam offline)

  5. Other notable events • ND cloud local storage problems • Currently trying to recover 70k files to avoid declaring them lost. Resubmitting most tasks and Rob subscribed to missing Raw input files. • RAL – worked to recover several ATLAS pools affected by the power cut. (159 files declared lost)

  6. Bulk reprocessing • Bulk Reprocessing • Originally planned to start Period D , then B, then A and then C • Instead Period D started, then period B, A and C to keep all jobs running in all clouds but….. • This processing pattern has caused problems with disk space issues at Tier 1 sites • Stopped early submission of periods A and C, D and B continue • As of Sunday period D – 98.5% done (before merge) Period B 68% done • Over weekend disk space in Tier 1 became an issue.

  7. T1 data disk space • Due to low free disk space – PIC, SARA, FZK all were removed from SANTA CLAUS, now 4 T1 sites excluded (DE,ES,NL,IT clouds) • Saturday – StephaneJezequel triggered cleaning (Victor is running very slowly recently). • Situation at FZK and SARA improved. • Monday (12-Nov) SARA will migrate 60 TB from scratch to data disk • PIC still issue as of Sunday night. • Stephane – moving away MC datasets

  8. LSF • LSF job dispatch speed caused problems all week, 60 K 6K

  9. GGUS tickets

  10. Conclusions • Thanks to the experts, sites, shifts (Comp@p1, ADCOS, ADCOS expert) • Bulk reprocessing proceeding relatively smoothly • LSF job submission speed causing Tier 0 team headaches • DATA disk space at the Tier 1 sites an issue. Needs to be monitored as not to effect Bulk reprocessing

More Related