150 likes | 267 Views
Provenance Challenge gLite Job Provenance. Ludek Matyska On behalf of the CESNET team (Czech Republic) GridWorld 2006 1 3 th September 2006. The Team. Ales Krenek – chair (Brno) Jiri Sitera (Pilsen) Frantisek Dvorak (Pilsen) Milos Mulac (Pilsen) Miroslav Ruda (Brno)
E N D
Provenance ChallengegLite Job Provenance Ludek Matyska On behalf of the CESNETteam (Czech Republic) GridWorld 2006 13th September 2006
The Team • Ales Krenek – chair (Brno) • Jiri Sitera (Pilsen) • Frantisek Dvorak (Pilsen) • Milos Mulac (Pilsen) • Miroslav Ruda (Brno) • Zdenek Salvet (Brno) • Daniel Kouril (Brno)
gLite and Jobs • gLite is a middleware developed within the EU EGEE project • The EGEE Grid is strictly job oriented • Submitting a job is the only way how users interact with the resources • Each job is described using the Job Description Language based on a ClassAd syntax • Very complex description is possible, including proximity to the storage of input/output files, environmental settings etc. • Job collections are also possible, forming simple workflows in the form of Directed Acyclic Graphs (DAGs) • Each DAG is completely described using nested JDL as a set of its nodes (jobs) and execution dependencies among them
Job Processing in gLite • Job is submitted through a User Interface • Workload Manager queues a job and starts to look for appropriate Computing Element • The job is passed to the selected Computing Element (to its queue) • The job runs • After a run, user can retrieve the job output (collected in the output sandbox) • All actions on a job are tracked by the Logging and Bookkeeping (LB) service, that provides job state and related information • After retrieval of the output sandbox, all the middleware data (including the whole LB data) are transferred to the Job Provenance (JP) • Users can add Annotations as tags (name/value pairs) to a job either via LB (when job is on a Grid) or JP (any time afterwards)
Challenge Workflow • Implemented as a gLite DAG • Procedures becomes nodes of the DAG (gLite jobs) • Dependencies among procedures as DAG jobs dependencies • Data are implicit, each job is responsible for upload and download of its input and outputs, resp., from an appropriate storage element • We setup a GridFTP server and all data were uploaded or downloaded using the gsiftp:// protocol • This means all data are identified by a their full URL
Provenance Trace • gLite Job Provenance is primary a storage and retrieval service for provenance data • Currently no GUI, only command line interface • Optimized to store large amount of provenance data • Mostly events recorded during job lifetime • WORM semantics for the primary data • User annotations • New annotations could be added any time • Annotations are “distilled” from the primary data, too • An extensible framework, where specific metadata processing is available through plug-ins that could be added at any time • The Provenance challenge participation challenged the metadata interpretation • more work in this area has been and is still needed
Attribute classes • Most work done on the annotations (processed raw events) • Four annotations’ classes used: • JP system ones • E.g. JobID or reistration time • Digested form LB trace • E.g. time when the job run • Digested from the JDL • E.g. Ancestor and Successor from the DAG description • Unqualified user tags • All attributes can occur multiple times • E.g. “softmean” has 4 ancestor annotations (with “reslice” value)
Specific Tags • We used 6 specific user tags for the provenance challenge • IPAW_OUTPUT • IPAW_INPUT • IPAW_STAGE • IPAW_PROGRAM • IPAW_PARAM • IPAW_HEADER • They kept the appropriate values as specified by the Provenance Challenge description • They were fed via the LB interface
JP Queries • JP Primary Server • Keeps primary data • Only data retrieval, the JobID must be known • JP Index Server • Configurable cache of subset of jobs and their attributes • It can search for jobs matching specific query criteria • Comparison of an attribute with a constant value • Multiple JP IS can serve one JP PS
Query #1 • Find the process that led to Atlax X Graphics • Input • URL of the queried Atlas X Graphic file • Outputs • List of nodes (DAG jobs) that contributed to the queried file • Input and output files (their URLs) • Stage of the workflow, program name and parameter values • Implementation • Recursive graphs search • Results: • Above mentioned list of nodes and their attributes • Low readability, no GUI manipulation • However, all the relevant information available
Query #3 • Find the Stage 3, 4, and 5 details of the process that lead to the Atlas X Graph • Same as Query #1, output restricted to the above specified stages • Comment • More efficient processing possible if we know the relationship between stages (i.e. we know that Stage 3 precedes Stage 4) • Generic enough to process STAGE specified via unstructured name, not only via numeric value
Query #4 • Find all invocations of procedure align_warp using a 12th order nonlinear 1365 parameter model that run on Monday • Outputs • Time, stage, program name, inputs, outputs • Implementation • JPIS is queried for jobs matching IPAW_PROGRAM=“align_warp” and IPAW_PARAM=“-m 12” • Output is filtered for Monday
Query #8 • Annotated anatomy images • Not directly possible • JP does not deal with data directly, only with jobs • No annotations on data available • Possible solution (not implemented, but a similar to the one used to answer Query #9): • Introduction of “dummy” jobs, that will have the particular data file assigned as their input. • Associate annotations with these jobs • Process job annotations instead of data annotations
Summary • gLite Job Provenance usable to answer all queries but one • gLite JP focused on efficient metadata storage and retrieval • In a semi-production operation on the EGEE preview testbed • gLite JP Usable as the lowest layer for more complex Provenance systems • Some processing currently done at the client site • Support for more complex workflows related to the introduction and support of complex workflows in the EGEE environment • New challenge: precise re-run of a job from a past (complex environment setup)