150 likes | 254 Views
Report on HEPCAL II. F.Carminati Geneva, May 28, 2003. GAG membership. Chair: F.Carminati ALICE: P.Buncic, PG.Cerello ATLAS: L.Perini, C.Tull CMS: R.Cavanaugh, C.Grandi LHCb: A.Tsaregorodtsev, N.Brook GDA: F.Donno, A.Sciabà GTA: D.Foster App Area: J.Beringer, O.Smirnova, A.Pfeiffer
E N D
Report on HEPCAL II F.CarminatiGeneva, May 28, 2003
GAG membership • Chair: F.Carminati • ALICE: P.Buncic, PG.Cerello • ATLAS: L.Perini, C.Tull • CMS: R.Cavanaugh, C.Grandi • LHCb: A.Tsaregorodtsev, N.Brook • GDA: F.Donno, A.Sciabà • GTA: D.Foster • App Area: J.Beringer, O.Smirnova, A.Pfeiffer • Relations with EU Projects: F.Harris • Relations with US Projects: R.Pordes • Experiment Independent expert: J.Templon Report on HEPCAL II
Planning • May 5-9 HEPCAL marathon • May 27 • June 28 Report on HEPCAL II
Plan of the document • Introduction • Basic concepts • The analysis activity • Analysis scenarios • Definition • Scenario • Metrics • Analysis execution models • Support for queries by common layers • Support for analysis job execution by a common layer • Interactive vs batch grid activity • System requirements • Provenance and job traceability • Interactive environment support • Deviant flows and errors • Service models • Analysis software deployment • Collaborative work • Recommendations • Conclusions • Description of use cases Report on HEPCAL II
What is analysis? • Perform queries on the Dataset Metadata Catalogue(s) (DMC) to determine LDN(s • Query the input set of LDNs, selecting the event components of interest and the events of interest, using event-level metadata. 3. Optionally save for further use the results of as defined in step . • Perform iterative analysis activity, looping over the event components selected by step above.. • Optionally save the results of and publish them • “Create TAG LDN” ( “shallow copy”) • “Make new reclustered DS” (“deep copy”) • “Create new DS” (“deep copy from reprocess”) Report on HEPCAL II
Support for queries • “Giant” metadata table divided in several tables and two categories • Metadata->LDN • Metadata->Object pointer Report on HEPCAL II
Support for analysis for job execution by a common layer • In HEPCAL no special support for this type of analysis job • If there is no special middleware support, the job may not benefit from being run in the Grid • Analysis may even take a step backward from pre-Grid days • We describe scenarios for how the Workload Management System (WMS) - experiment-independent middleware - might support execution of analysis jobs as described above • Of course we trespass into implementation… but Report on HEPCAL II
Support for analysis for job execution by a common layer • WN does it all • The user submits the job that is sent to the WN where the query is executed. Input data is accessed via standard mechanisms as described in HEPCAL – relying on the fact that the files are local is not possible since the WMS does not know the list of LDNs and hence cannot submit the job to a CE close to the SE where local copies are • WMS queries DMC and then submits job • User submits the job, the WMS performs the query to the DMC to optimise the CE selection, the job is sent to WN with the experiment dependent query and the list of LDNs, job execution starts (see previous case) • The next items are implemented making use of the DAG mechanism. Experiment dependent tools that can be invoked by the WMS (plugins) are required Report on HEPCAL II
Support for analysis for job execution by a common layer • "WMS queries DMC, submits multi-jobs and merges output” • User submits the job, and the WMS performs the DMC query as in b). The WMS generates several sub-jobs, one for each CE close to at least one of the input PDNs. • Each of the sub-jobs runs the algorithm on the its local data • Experiment-dependent merging of the sub-job outputs must take place at the end. Report on HEPCAL II
Support for analysis for job execution by a common layer • “WMS queries DMC, performs multi-queries and merges input”: • User submits the job, and the WMS performs the DMC query as in b). The WMS generates several sub-jobs as in c) • Each of these sub-jobs selects the events matching the experiment-dependent part of the query and places these on its output • This output is merged at a final job in the pipeline, where it forms the input of the user-specified algorithm. Report on HEPCAL II
Provenance Report on HEPCAL II
Interactive vs batch Full interactive Interactive batch Real time batch Black box • Trying to define interactive vs batch computing turned out to be rather complicated • But we have a really good understanding now Instant Fast Slow Glacial Report on HEPCAL II
System requirements • Provenance and job traceability • Difficult to say what we want and not “how” we want it • Basically how to reproduce a given result • Interactive environment support • Log books and persistent sessions • Deviant flows and errors Report on HEPCAL II
Varia • Service model • Software deployment? • What are the “real” requirements • Not clear if and what we should put here! • Collaborative work • Important for analysis, but is this really the right place? • Recommendations • Use cases Report on HEPCAL II
Conclusion • Whatever will happen next… up to now it was a lot of fun (constructive I mean) • Probably there is too much “open stuff” in the document and we have to descope • But we have interesting pointers for HEPCAL-II-Prime! • The date of End of June is of course maintained Report on HEPCAL II