190 likes | 357 Views
CREAM deployment news. John Gordon, Antonio Retico GDB 10-Feb-10 - CERN. Agenda. Good afternoon!. Status of deployment Open questions a) How goes the experiment testing? b) Are there still blockers to phasing out lcg -CE? c) Was scalability and reliability proven?
E N D
CREAM deployment news John Gordon, Antonio Retico GDB 10-Feb-10 - CERN
Agenda Good afternoon! Status of deployment Open questions • a) How goes the experiment testing? • b) Are there still blockers to phasing out lcg-CE? • c) Was scalability and reliability proven? • d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN
Deployment status Sites encouraged to deploy CREAM at various levels • MB, GDB, SA1, Pilot Service • Nick’s list around for 1 year now ( Happy Birthday!) • lcg-CE still far from replacement ~ 50 instances around on Feb 2010 • Clearly not very popular yet 6 European T1s have installed some • Do we expect, ASGC, TRIUMF, BNL and FNAL to use it? Time for a mini-review? GDB - 10 Feb 10 - CERN
Open questions a) How goes the experiment testing? b) Are there still blockers to phasing out lcg-CE? c) Was scalability and reliability proven? d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN
Open questions a) How goes the experiment testing? GDB - 10 Feb 10 - CERN
Experiment@Work (Alice) Historically the happiest On the way of deprecating lcg-CEs at their sites • Also for submission via WMS Can they do it? • Would that affect A/R metrics (see next slides)? GDB - 10 Feb 10 - CERN
Experiment@Work (LHCb) Submission to CREAM seamlessly enabled SAM tests show many sites still failing • ~40% of sites are passing the tests • Mostly faulty configuration of the LHCb queues • Not a bug but diffused inexperience with CREAM config at sites GDB - 10 Feb 10 - CERN
Experiment@Work (CMS) Considerable testing activity registered recently Trying to us PROD Agent with ICE-CREAM A couple of issues reported • Bookkeeping • Problems in updating the job status • Jobs actually finished are still reported as running • Operations • services started in the wrong order by YAIM after updates The first is seen as a showstopper for production • Not a bug but a CREAM/Blparser configuration issue • CREAM 1.6 (patch 3179) will make configuration easier • Fix for ICE bug #61405 expected GDB - 10 Feb 10 - CERN
Experiment@Work (Atlas) lcg-CE required until end 2010 Outcome of first CondorG submission testing • Testing promising but inconclusive • Only find problems by heavy usage • shift expert support from LCG CE to Cream CE • Keep LCG CE but recommend sites with >1 CE install CreamCE (Rod Walker @ ATLAS Tier-1/2/3 Jamboree) GDB - 10 Feb 10 - CERN
A note about the release CREAM 1.6 expected to come with many bug fixes • Most of them found by developers Still with the developers (now Product Team) • They will do certification of 1.6 Entered pre-certification today (10-Feb) First release with a new delivery process • May take some time GDB - 10 Feb 10 - CERN
Open questions b) Are there still blockers to phasing out lcg-CE? (My view on Nick’s list. Comments welcome) GDB - 10 Feb 10 - CERN
CondorG (point B) Testing of CondorG submission path taking off now • Issues are still under analysis Need to wait GDB - 10 Feb 10 - CERN
Operations tools (Point D) SAM/Nagios tests are there What about A/R metrics? • Can a site run only CREAM (and still count as CE provider to WLCG)? Long transition period to be expected • With CEs we cannot use the ‘SRMv2 test approach’ • Wait for enough CREAMs to be there • Switch the A/R to use CREAM “overnight” What is site CE availability for a site? • Av[CE] = OR [Av (CREAM),Av(lcg-CE), Av(ARC)] or • Av[CE] = AND [Av (CREAM),Av(lcg-CE), Av(ARC)] ? • Something new to be implemented in Gridview GDB - 10 Feb 10 - CERN
Graceful failure (point O) Still some developments expected to fix point O) • “Graceful failure or self-limiting behavior when the CE load reaches its maximum” • Problem probably hit at KIT (pending jobs) • New limiter expected in 1.6 GDB - 10 Feb 10 - CERN
Open questions c) Was scalability and reliability proven? GDB - 10 Feb 10 - CERN
Scalability/Reliability (various points) Scalability • Which sites can report a production experience at significant scale? Reliability • Issues still being found affecting version 1.5 • Mainly bad configurations concerning WMS submission path • Mostly fixed with CREAM 1.6 + a new version of ICE GDB - 10 Feb 10 - CERN
Open questions d) Why sites don't upgrade? GDB - 10 Feb 10 - CERN
Some hypothesis No pressure by the experiments • Are the experiments happy with the current scale? New “latest and greatest” updates always in the pipe • One could say that time is still needed to mature lcg-CE works for now (don’t fix it!) lcg-CE is still the unique reference for site computing quality reports Others? GDB - 10 Feb 10 - CERN
Questions? ? GDB - 10 Feb 10 - CERN