280 likes | 447 Views
DIAL: Distributed Interactive Analysis of Large Datasets. DOSAR meeting. David Adams – BNL September 16, 2005. DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces. Contents. Status Interactivity Current results Conclusions
E N D
DIAL: Distributed Interactive Analysisof Large Datasets DOSAR meeting David Adams – BNL September 16, 2005
DIAL project Implementation Components Dataset Transformation Job Scheduler Catalogs User interfaces Contents • Status • Interactivity • Current results • Conclusions • More information • Contibutors D. Adams DOSAR meeting DIAL
DIAL project • DIAL = Distributed Interactive Analysis of Large datasets • Goal is to demonstrate the feasibility of doing interactive analysis of large data samples • Analysis means production of analysis objects (e.g. histograms or ntuples) from HEP event data • Interactive means that user request is processed in a few minutes • Large is whatever is available and useful for physics studies • How large can we go? • Approach is to distribute processing over many nodes using the inherent parallelism of HEP data • Assume each event can be independently processed • Each node runs an independent job to processes a subset of the events • Results from each job are merged into the overall result D. Adams DOSAR meeting DIAL
Implementation • DIAL software provides a generic end-to-end framework for distributed analysis • Generic means few assumptions about the data to be processed or application that carries out the application • Experiment and user must provide extensions to the system • Able to support a wide range of processing systems • Local processing using batch systems such as LSF, Condor and PBS • Grid processing using different flavors of CE or WMS • Provides friendly user interfaces • Integration with root • Python binding (from GANGA) • GUI for job submission and monitoring (from GANGA) • Mostly written in C++ • Releases packaged 3-4 times/year D. Adams DOSAR meeting DIAL
Components • AJDL • Abstract job definition language • Dataset and transformation (application + task) • Job • C++ interface and XML representation for each • Scheduler • Handles job processing • Catalogs • Hold job definition objects and their metadata • Web services • User interfaces • Monitoring and accounting D. Adams DOSAR meeting DIAL
Dataset • Generic data description includes • Description of the content (type of data) • E.g. reconstructed objects, electrons, jets with cone size 0.7,… • Number of events and their ID’s • Data location • Typically a list of logical file names • List of constituent datasets • Model is inherently hierarchical • DIAL provides the following classes • Dataset – defines the common interface • GenericDataset – provides data and XML representation • TextDataset – embedded collection of text files • SingleFileDataset – Data location is a single file • EventMergeDataset – Collection of single file event datasets D. Adams DOSAR meeting DIAL
Dataset (cont) • Current approach for experiment-specific event datasets • Add a GenericDataset subclass that is able to read experiment files and fill the appropriate data • Use EventMergeDataset to describe large collections • DIAL can carry out processing using the GenericDataset interface and implementation • Allowed for experiment to provide its own implementation of the Dataset interface • If GenericDataset is not sufficient • Processing components then require this implementation D. Adams DOSAR meeting DIAL
Transformation • A transformation acts on a dataset to produce another dataset • Fundamental user interaction with the system • Transformation has two components • Application describes the action to take • Task provides data to configure the application • Task is a collection of named text files • Application holds two scripts • run – Carries out transformation • Input is dataset.xml and output is result.xml • Has access to the transformation • build_task – Creates transformation data from the files in the task • E.g. compiles to build library • Build is platform specific D. Adams DOSAR meeting DIAL
Job • The Job class hold the following data: • Job definition: application, task, dataset and job preferences • Status of the corresponding job • Initialized, running, done, failed or killed • Other history information • Compute node, native ID, start and stop times, return code, … • Job provides interface to • Start or submit job • Update the status and history information • Kill job • Fetch the result (output dataset) D. Adams DOSAR meeting DIAL
Job (cont) • Subclasses • Provide connection to underlying processing system • Implement these methods • Currently support fork, LSF and Condor • ScriptedJob • Subclass that implements these methods by calling a script • Scripts may be then written to support different processing systems • Alternative to providing subclass • Added in release 1.20 • Job copies • Jobs may be copied locally or remotely • Copy includes only the base class: • All common data available but job cannot be started, updated or killed • Use scheduler for these interactions D. Adams DOSAR meeting DIAL
Scheduler • Scheduler carries out processing • The following interface is defined • add_task builds the task • Input: application, task • Output: job ID • submit creates and starts a job • Input: application, task, dataset and job preferences • Output: job ID • job(JobId) returns a reference to the job • Or to a copy if the scheduler is remote • kill(JobId) to kill a job D. Adams DOSAR meeting DIAL
Scheduler (cont) • Subclasses implement this interface • LocalScheduler • Use a single job type and queue to handle processing, e.g. LSF • MasterScheduler • Splits input dataset • Uses an instance of LocalScheduler to process subjobs • Merges results from subjobs • Analysis service • Web service wrapper for Scheduler • This service may run any scheduler with any job type • WsClientScheduler subclass acts as a client to the service • Same interface for interacting with local fork, local batch or remote analysis service • Typical mode of operation is to use a remote analysis service D. Adams DOSAR meeting DIAL
Scheduler (cont) Typical structure of a job running on a MasterScheduler D. Adams DOSAR meeting DIAL
Scheduler (cont) • Scheduler hierarchy • Plan to add a subclass to forward requests to a selected scheduler based on job characteristics • This will make it possible to create a tree (or web) of analysis services • Should allow us to scale to arbitrarily large data samples (limited only by available computing resources) D. Adams DOSAR meeting DIAL
Scheduler (cont) Example of a possible scheduler tree for ATLAS D. Adams DOSAR meeting DIAL
Catalogs • DIAL provides support for different types of catalogs • Repositories to hold instances of application, task, dataset and job • Selection catalogs associate names and metadata with instances of these objects • And others • MySql implementations • Populated with ATLAS transformations and Rome AOD datasets D. Adams DOSAR meeting DIAL
User interfaces • Root • Almost all DIAL classes are available at the root command line • User can access catalogs, create job definitions, and submit and monitor jobs • Python interface available in PyDIAL • Most DIAL classes are available as wrapped python classes • Work done by GANGA using LCG tools • GUI • There is a GUI that supports job specification, submission and monitoring • Provided by GANGA using the PyDIAL D. Adams DOSAR meeting DIAL
Status • Releases • Most of the previous is available in the current release (1.20) • Release 1.30 expected next month • Release with service hierarchy in January • Use in ATLAS • Ambition is to use DIAL to define a common interface for distributed analysis with connections to different systems • See figure • ATLAS Distributed analysis is under review • Also want to continue with the original goal of providing a system with interactive response for appropriate jobs • Integration with the new USATLAS PANDA project D. Adams DOSAR meeting DIAL
Possible analysis service model for ATLAS D. Adams DOSAR meeting DIAL
Status (cont) • Use in other experiments • Some interest but no serious use that I know of • Current releases depend on ATLAS software but I am willing to build a version without that dependency • Not difficult • A generic system like this might be of interest to smaller experiments that would like to have the capabilities but do not have the resources to build a dedicated system D. Adams DOSAR meeting DIAL
Interactivity • What does it mean for a system to be interactive? • Literal meaning suggests user is able to directly communicate with jobs and subjobs • Most users will be satisfied if their jobs complete quickly and do not need to interact with subjobs or care if some other agent is doing so • Important exception is the capability to a job and all its subjobs • Interactive (responsive) impose requirements on the processing system (batch, WMS, …) • High job rates (> 1 Hz) • Low submit-to-result job latencies (< 1 minute) • High data input rate from SE to farm (> 10 MB/job) D. Adams DOSAR meeting DIAL
Current results • ATLAS generated data samples for the Rome physics meeting in June • Data includes AOD (analysis oriented data) that is a summary of the reconstructed event data • AOD size is 100 kB/event • All Rome AOD datasets are available in DIAL • An interactive analysis service is deployed at BNL using a special LSF queue for job submission • The following plot shows processing times for datasets of various sizes • LSF SUSY (green triangles) is a “real AOD analysis” producing histograms • LSF big produces ntuples and extra time is for ntuple merging (not parallelized) • Largest jobs take 3 hours to run serially and are processed in about 10 minutes with the DIAL interactive service D. Adams DOSAR meeting DIAL
Conclusions • DIAL analysis service running at at BNL • Serves clients from anywhere • Processing done locally • Rome AOD datasets available • Robust and responsive since June • Next steps • Similar service running at UTA (using PBS) • Service to select between these • BNL service to connect to OSG CE (in place of local LSF) • Integration with new ATLAS data management system • Integration with PANDA D. Adams DOSAR meeting DIAL
More information • DIAL home page: http://www.usatlas.bnl.gov/~dladams/dial • ADA (ATLAS Distributed Analysis) home page: http://www.usatlas.bnl.gov/ADA • Current DIAL release: http://www.usatlas.bnl.gov/~dladams/dial/releases/1.20 D. Adams DOSAR meeting DIAL
Contributors • GANGA • Karl Harrison, Alvin Tan • DIAL • David Adams, Wensheng Deng, Tadashi Maeno, Vinay Sambamurthy, Nagesh Chetan, Chitra Kannan • ARDA • Dietrich Liko • ATPROD • Frederic Brochu, Alessandro De Salvo • AMI • Solveig Albrand, Jerome Fulachier • ADA • (plus those above), Farida Fassi, Christian Haeberli, Hong Ma, Grigori Rybkine, Hyunwoo Kim D. Adams DOSAR meeting DIAL