110 likes | 222 Views
AJDL: Abstract Job Description Language. PPDG Collaboration Meeting Williams Bay. David Adams BNL June 29, 2004. Model Components Implementation. Contents. Model. Job-based model User selects an input dataset User selects/constructs a xform to apply to this dataset
E N D
AJDL: Abstract Job Description Language PPDG Collaboration Meeting Williams Bay David Adams BNL June 29, 2004
Model Components Implementation Contents AJDL PPDG Collaboration Meeting
Model • Job-based model • User selects an input dataset • User selects/constructs a xform to apply to this dataset • Distributed analysis system constructs a job to apply the xform to the dataset • Result is a new dataset • Partial results may be available during processing • User examines the result • From this identify the components of AJDL • Dataset • Transformation (e.g. application and task) • Job (xform, dataset, job preferences) AJDL PPDG Collaboration Meeting
Model (cont) • Abstract means • User job definition should be suitable for invocation at any site using any WMS • Specify what to do; not how to do it • Analysis service • Receives abstract job request • Split into subjobs • Typically by splitting input dataset • Map transformation to local executable and runtime environment • Run executable on each sub-dataset • Gather and merge results from each sub-job AJDL PPDG Collaboration Meeting
Components • Dataset • Identity • Dataset is immutable • Location • Typically list of LFN’s • May be absent (virtual dataset) • DRC then provides • Content • Which events • Type of data in each event (raw, trackxs, jets, aod, …) • Compound structure • List of sub-datasets • Can be a tree structure AJDL PPDG Collaboration Meeting
Components (cont) • Application • Script to process a dataset • Output is another dataset • List of software packages • Assume package management service to provide location of a specified package • May have automatic installation • Application advertises the required content • Compare with content of input dataset to verify compatibility • Second script to build task before processing • E.g. compile provided sources AJDL PPDG Collaboration Meeting
Components (cont) • Task • Carries the data used to configure the application • At present the task carries embedded text files • E.g. myalg.cxx • May add named parameters AJDL PPDG Collaboration Meeting
Components (cont) • Job preferences • Allow user to provide hits for processing • Location for output data • User role • Desired response time • System may ignore or freely interpret these AJDL PPDG Collaboration Meeting
Components (cont) • Job • ID • Current state (initializing, running, done, failed, …) • Start stop time • List of sub-job ID’s • Input application, task and dataset • Output dataset • Partial result if job is not complete • Access to control job • Suspend/resume • Kill AJDL PPDG Collaboration Meeting
Implementation • Extensibility • Must be extensible to support different types of datasets and jobs • AtlasPoolEventDataset, RootHistogramDataset, … • ProcessJob, LsfJob, CondorJob, EgeeJob, … • Can we use the same schema for all types? • So far yes for jobs • Probably for applications and tasks • Not clear for datasets • Data representation • XML description for each type AJDL PPDG Collaboration Meeting
Implementation • Classes • Provide class interfaces for each type • C++, python and maybe java • C++ from DIAL • Python binding to C++ using lcgdict (GANGA) • Convenience for implementing clients and services • Add operations to take action • E.g. fetch local replicas of files in a dataset • Update status or kill a job • May add functionality for subtypes • Extract histograms for a RootHistogramDataset AJDL PPDG Collaboration Meeting