170 likes | 415 Views
DIAL. PPDG meeting Interactive analysis. David Adams BNL December 19, 2002. Goals of DIAL What is DIAL? DIAL interactions Dataset properties Application properties DIAL status Future. Contents. Goals of DIAL. Demonstrate the feasibility of interactive analysis of large datasets
E N D
DIAL PPDG meeting Interactive analysis David Adams BNL December 19, 2002
Goals of DIAL What is DIAL? DIAL interactions Dataset properties Application properties DIAL status Future Contents DIAL PPDG Interactive Analysis
Goals of DIAL • Demonstrate the feasibility of interactive analysis of large datasets • Large means too big for interactive analysis on a single CPU • Set requirements for GRID middleware • Provide ATLAS with a tool to analyze DC1 and DC2 event data • More that just ntuples • Large samples • Distributed data and processing DIAL PPDG Interactive Analysis
What is DIAL? • Distributed • Data and processing • Interactive • Prompt response (seconds rather than hours) • Analysis of • Fill histograms, select events, … • Large datasets • Any event data (not just ntuples or tag) DIAL PPDG Interactive Analysis
What is DIAL? (cont) • DIAL provides a connection between • Interactive analysis framework • E.g. ROOT • Data processing application • Athena for ATLAS • User supplies task • Defines result • E.g. histogram • C++ code snippet to fill result DIAL PPDG Interactive Analysis
What is DIAL? (cont) • Scheduler • Accepts dataset, task and application from user • Splits dataset along file boundaries • Creates and submits a job for each sub-dataset • Concatenates results from jobs • Makes combined result available to the user • Provides status reports • Fraction of events processed • Estimated time to completion • Partial results DIAL PPDG Interactive Analysis
DIAL interactions 9. fill Job 1 Dataset 1 Dataset 2 Result 7. create 8. create(app,tsk,ds1) Dataset 6. split 10. gather Scheduler 4. select 1. Create or locate 8. create(app,tsk,ds2) Analyzer 5. submit(app,tsk,ds) e.g. ROOT 2. select 3. Create or select Job 2 Result Application Task 9. fill e.g. ATHENA DIAL PPDG Interactive Analysis
Dataset properties • From this interaction we deduce the following properties for datasets: • Dataset is a collection of data objects • Dataset has content • Dataset has location • Dataset has an identity • Dataset is portable • For details, see following talk • http://www.usatlas.bnl.gov/~dladams/dataset/talks/021219_dataset.ppt DIAL PPDG Interactive Analysis
Application properties • Current specification is • Name • E.g. athena • Version • E.g. 5.10.01 • List of shared libraries • E.g. libRawData, libInnerDetectorReco DIAL PPDG Interactive Analysis
Application properties (cont) • Each DIAL compute node provides an application description database • Indexed by application name and version • Application description includes • Location of executable • Run time environment (env variables) • Including executable and shared library paths • Command to build shared library from task source code • Can be shared by nodes with the same OS/compiler DIAL PPDG Interactive Analysis
Application properties (cont) • Alternative view: Packages • Application specifies • Software packages (e.g. ROOT or ATLAS) • Executable • Shared libraries • Data files • Task build command and run time environment are extracted from the package(s) • Requires common package interface • No need to distribute application definitions DIAL PPDG Interactive Analysis
DIAL status • All components in place • http://www.usatlas.bnl.gov/~dladams/dial • But scheduler is very simple • local (same node) • creates a single processing job DIAL PPDG Interactive Analysis
Future • Refine application definition • Scheduler • Remote processing • Multiple jobs (splitting input dataset) • Multiple sites using GRID • GRID integration • Identify components • Set requirements • Incorporate existing products DIAL PPDG Interactive Analysis