170 likes | 302 Views
FAX update. 26 th August 2013. Content. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation dCache monitor 5.0.0 Collector Dashboard 50 shades of green. Running issues. Dead endpoints:
E N D
FAX update 26thAugust 2013
Content • Running issues • FAX failover • Moving to new AMQ server • Informing on endpoint status • Monitoring developments • Monitoring validation • dCache monitor 5.0.0 • Collector • Dashboard • 50 shades of green Ilija Vukotic ivukotic@uchicago.edu
Running issues • Dead endpoints: • Frascati, Manchester, LAL • cmsd services are dead at: • Taiwan-lcg2, LPSC, Protvino, SWT2_CPB • /atlas/dq2/user/gangarbtlookups • Made half of federation endpoints not accessible from upstream redirectors. • will be more explained by Johannes. • Remaining issues with x509 • communicating our wish to get it turned on • BU, DESY-HH, DESY-ZN, FZK, LRZ-LMU, MPPMU, Freiburg, Wuppertal, Geogrid Ilija Vukotic ivukotic@uchicago.edu
Runningissues Rather green considering it’s August ! Quite a bit of traffic considering it’s August ! New functional HC tests should not contribute much AFAIK Ilija Vukotic ivukotic@uchicago.edu
FAX failover • FAX failover works http://pandamon.cern.ch/fax/failover. • Developments: • Cloud is shown and corrected queue names • Side menu • In works: • Filtering on user • Graphing • To ponder: • Site admins are not aware of this possibility. How do we communicate to them that it is in their best interest to turn it on? Ilija Vukotic ivukotic@uchicago.edu
FAX failover Production jobs failing over to FAX FAX dedicated submenu Will add here panda brokered job statistics Ilija Vukotic ivukotic@uchicago.edu
Moving to new AMQ server • All FAX related info was sent to pilot.msg.cern.ch • There was no authentication • Moved to Dashboard test broker • Consumer now uses STOMP+SSL • Required change to new stomp version • This week will move to production server Ilija Vukotic ivukotic@uchicago.edu
Informing on endpoint status • Mailing from SSB works and gives results. • Do we want SAM updates too? • What would it take? • Who would do it? Ilija Vukotic ivukotic@uchicago.edu
Monitoring developments • There is a need to remotely check if cmsd works. • We had (and still have) sites showing as green for direct access and red for downstream redirection. • Investigation shows that actually cmsd’s are dead/not responding. • Need a way to directly probe cmsd’s • Andy will look at the ways to do it. • To develop new columns for SSB: • xRootD version • Rucio support • Monitoring status Ilija Vukotic ivukotic@uchicago.edu
Monitoring validation • First step is validation that results shown by Matevz’s collector are correct. • I was sending xrootd summary messages to collector and checking what I see in plots. While messages arrive and get shown, there is something wrong in calculating/plotting summaries. Ilija Vukotic ivukotic@uchicago.edu
dCache monitor 5.0.0 • dCache monitor mostly rewritten: • dCache compatible logging • UDP messaging from same ports • Sends “=” stream • Sends more data (substitutes DN \CN with username etc.) • Made compatible with collector • Tested at MWT2. Very good results. • End of the week, RPM will be produced and placed in WLCG repository. CMS will be informed about new version. Ilija Vukotic ivukotic@uchicago.edu
Collector • New version being prepared by Matevz • New AMQ version • BIG ISSUE: • Some CMS sites are sending info to our collector.Will be raised with Brian B. Ilija Vukotic ivukotic@uchicago.edu
Dcache monitor 5.0.0 • Now gives really important and actionable information. Just during debugging I noticed: • Files opened, read a small percentage and kept open for hours. • Same file open twice in the same session (?!) • Rather small usage of vector reads. Ilija Vukotic ivukotic@uchicago.edu
In dashboard Why difference between table and plots? What’s idea of “Site history” tab? Need to investigate why CMS sites appear here (CERN-CMSTEST) Ilija Vukotic ivukotic@uchicago.edu
PANDA re-brokering • Discussed at last CERN S&C week • We agreed on providing an estimate of cost to move data in WAN to PANDA, so it could re-broker jobs from very long queues to sites with free slots that have good connection to data. • Cost matrix exist in SSB. • Code reading it from SSB doing exponential decay smoothing runs and sends info to AGIS. • Have to check scalability of AGIS bulk update. • Waiting for Artem to code moving data from AGIS to schedconfig. • Next step is Tadashi making use of that table from schedconfig and actually re-broker. • Finally we’ll have to monitor it the same way we do with Failover. No developments Ilija Vukotic ivukotic@uchicago.edu
50 shades of green • Green color in any of the FAX SSB monitor metrics is based on one and the same file. • This involves a lot of cached information. • Need to find out a percentage of successfully obtained files from much large file pool while avoiding caching effects. • Simple code developed to test all endpoints having FDR datasets. Doing _file0->ls() on each of the ~800 files. Sequential. • Currently run by hand. • You may find it in FAXtools/FAXtestsFDR of our CERN FAX git repo. Ilija Vukotic ivukotic@uchicago.edu