100 likes | 333 Views
AMOD Report ADC weekly , CERN, 29 November 2011. Alexei Klimentov Brookhaven National Lab. DDM. Throughput (clouds). MB/s. MB/s. Throughput (activities). Extra TOP datasets planned replicas. MC, Group Production and Data Processing. 0. 86 M completed jobs (0.57M – MC production).
E N D
AMOD ReportADC weekly, CERN, 29 November 2011 Alexei Klimentov Brookhaven National Lab
DDM Throughput (clouds) MB/s MB/s Throughput (activities) Extra TOP datasets planned replicas Alexei Klimentov – ADC weekly
MC, Group Production and Data Processing 0.86 M completed jobs (0.57M – MC production) Alexei Klimentov – ADC weekly
Grid Analysis 1.6 M completed jobs Alexei Klimentov – ADC weekly
“Databases” • Tue Nov 22 : ~30’ ADCR outage • Transparent intervention was announced. It wasn’t transparent, because of human factor. The issue was quickly fixed by IT DB team. • Wed Nov 23 : LFC database at CERN contains ~9500 files w/o parent file id. The issue was fixed. No impact on database or applications performance • Wed Nov 23 (21:45) : ADCR problem. Multiple disks failure. Some records (8 records in PanDA table) were corrupted. The issue was fixed by IT DB team, the corrupted rows were cleaned by Gancho. • Thu Nov 24 : LFC problem (prod-lfc-atlas.cern.ch) and high ADCR rate (matview refresh issue). • Both issues were caused by the previous problem. Fixed by IT-DB team • Fri Nov 25 : ADCR high load (aggressive deletion service) • Fri Nov 25 : 22:00 – Sat Nov 26 11:00 ‘DDM dashboard statistics is not updated’ • LCGR database issue. MarcinBlaszyk and David Tuckett worked during Sat night. The problem was identified and fixed by 11:00am Statistics is regenerated (David) • Dashboard agent instructions to be reviewed • ddmusr01 has no access to the dashboard machine Alexei Klimentov – ADC weekly
Tier-1s • Nov 21 : 19:00 SARA and INFN-T1 are in full production • Nov 28 : 6’ UPS outage in IN2P3-CC, site was in full production 3h later Alexei Klimentov – ADC weekly
Tier-2s • DE,FR,IL,IT,US Tier-2s and Tier-3s off/on in DDM and Production • Found subscriptions to the Grid Sites, though sites are not in DDM FT. Missing sites are added to DDM FT. Alexei Klimentov – ADC weekly
False Security Alarm • Fri Nov 26. Excellent work of CERN IT security team, Central Services (SB) and Operations (ADiGi) Alexei Klimentov – ADC weekly
Misc. • CASTOR to EOS migration (ADiGi, GN) • Physics groups space • Started on Thu Nov 24 • Final step Mon Nov 25 • Issue with python version of dq2.cfg on one of VOBOXes, production was affected. Fixed (TM, SB) • Distributed Analysis monitoring tables are empty (reported by users). Fixed (VF) Alexei Klimentov – ADC weekly
ADCoS and COMP@P1 Shifts • Many issues were identified and covered by ADC shift team. • Excellent work of shifters and excellent organization. Alexei Klimentov – ADC weekly