380 likes | 542 Views
LHCb Development. Glenn Patrick Rutherford Appleton Laboratory. b. B meson. d. b. d. LHCb - Reminder. 1.2M electronic channels Weight ~4,000 tonnes. Muon System. Tracking stations (inner and outer). Magnet. Calorimeters. RICH2. VELO. 20 m. RICH1. Anti-B meson.
E N D
LHCb Development Glenn Patrick Rutherford Appleton Laboratory GRIDPP9
b B meson d b d LHCb - Reminder 1.2M electronic channels Weight ~4,000 tonnes Muon System Tracking stations (inner and outer) Magnet Calorimeters RICH2 VELO 20 m RICH1 Anti-B meson GRIDPP9
LHCb GridPP Development LHCb development has been taking place on three fronts: • MC Production Control and Monitoring Gennady Kuznetsov (RAL) • Data Management Carmine Cioffi (Oxford) Karl Harrison (Cambridge) • GANGA Alexander Soroko (Oxford) Karl Harrison (Cambridge) All developed in tandem with LHCb Data Challenges GRIDPP9
RICH2 RICH1 VELO TT Data Challenge DC03 • 65M events processed. • Distributed over 19 different centres. • Averaged 830,000 events/day. • Equivalent to 2,300 × 1.5GHz computers. • 34% processed in UK at 7 different institutes. • All data written to CERN. “Physics” Data Challenge. Used to redesign and optimise detector … GRIDPP9
The LHCb Detector Reduced number of layers for M1 (4 2) Reduced number of tracking stations behind the magnet (4 3) No tracking chambers in the magnet No B field shielding plate Full Si stationReoptimized RICH-1 design Reduced number of VELO stations (25 21) Changes were made for material reduction andL1 trigger improvement GRIDPP9
“Detector” TDRs completed Only Computing TDR remains GRIDPP9
Data Challenge 2004 “Computing” Data Challenge. April – June 2004 Produce 10 × more events. At least 50% to be done via LCG. Store data at nearest Tier-1 (i.e. RAL for UK institutes) Try out distributed analysis. Test computing model and write computing TDR. Require stable LCG2 release with SRM interfaced to RAL DataStore GRIDPP9
ScotGrid NorthGrid SouthGrid London Grid DC04: UK Tier-2 Centres NorthGrid Daresbury, Lancaster, Liverpool, Manchester, Sheffield SouthGrid Birmingham, Bristol, Cambridge, Oxford, RAL PPD ScotGrid Durham, Edinburgh, Glasgow LondonGrid Brunel, Imperial, QMUL, RHUL, UCL 11 01 10 00 11 GRIDPP9
DIRAC Architecture Job Auditing Provenance Information Service Authentication Authorisation User Interface API Accounting Metadata Catalogue Grid Monitoring File Catalogue Workload Management Package Data DIRAC components Manager Management Other project components: AliEn, LCG, … Storage Computing Element Element Resources: LCG, LHCb production sites GRIDPP9
DIRAC Distributed Infrastructure with Remote Agent Control MC Control Status Gennady Kuznetsov Control toolkit breaking down production workflow into components – modules, steps. To be deployed in DC04. SUCCESS! GRIDPP9
Monitoring service Bookkeeping service Production service Monitoring info Bookkeeping data Agent Get jobs Agent Site A Agent Agent Site B Site C Site D DIRAC v1.0 Original scheme “Pull” rather than “Push” GRIDPP9
Production Workflow Job Job Job Step Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job Step Step Step Job Job Job Step Step Step Step Module Module Module Module Module Module Module Components – MC Control Module is the basic component of the architecture This structure allow the Production Manager to construct any algorithm as a combination of modules. • Levels of usage: • Module – Programmer • Step – Production Manager • Workflow – User/Production manager Each step generates job as a Python program. GRIDPP9 Gennady Kuznetsov
Module Editor Stored as XML file Module Name Description Python code of single module. Can be many classes. Module variables. GRIDPP9 Gennady Kuznetsov
Step Editor Stored as XML file, where all modules are embedded Definitions of Modules Step Name Instances of Modules Description Selected instance Variables of currently selected instance Step variables. GRIDPP9 Gennady Kuznetsov
Workflow Editor Workflow Name Stored as XML file Step Definitions Step Instances Selected Step Instance Description Variables of currently selected Step Instance Workflow Variables. GRIDPP9 Gennady Kuznetsov
Production Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job Job Workflow Definition Job Job Job Step Step Step Step Step Step Step Job Splitting The input value for the job splitting is a Python list object. Every single (top level) element of this list applies to the Workflow Definition and propagates through the code and generates a single element of the production (one or several jobs). Python List GRIDPP9 Gennady Kuznetsov
Future: Production Console Once an agent has received a workflow, the Production Manager has no control over any function in a remote centre. Local Manager must perform all of the configurations and interventions at individual site. Develop ”Production Console” which will provide extensive control and monitoring functions for the Production Manager. Monitor and configure remote agents. Data replication control. Intrusive system – need to address Grid security mechanisms and provide robust environment. GRIDPP9
DIRAC v1.0 Architecture Production Manager GRIDPP9
DIRAC v2.0 WMS Architecture Based on central queue service Production Service GRIDPP9 Also data stored remotely
Data Management Status Carmine Cioffi File catalogue browser for POOL Integration of POOL persistency framework into GAUDI new EventSelector interface. SUCCESS! GRIDPP9
Main Panel, LFN Mode Browsing Shows the metadata schema, with the possibility to change it Import the fragment of a catalog Reload the catalog List all the metadata value of the catalog Write mode selection List the files selected Read the next and previous bunch of files from the catalog Filter text bar. Search text bar List of LFNs List of PFNs associated to the LFN selected from the list of LFNs on the left sub-panel Tabs for LFN / PFN mode selection Browser allows user to interact with catalogue via GUI. POOL file catalogue provides LFN & PFN association. Can save list of LFNs for job sandbox GRIDPP9
Sub menu with three operations to be done on the file selected. Main Panel, PFN Mode Browsing In PFN mode, the files are browsed in the same way as Windows Explorer. The folders are shown on the left sub-panel and the value of the folder on the right sub-panel. Write mode button opens WrFCBrowser frame allowing user to write to the catalogue… GRIDPP9
Write Mode Panel Remove a LFN Add LFN Add metadata value Delete a PFN Rollback Commit Add a PFN replica Register a PFN Show the action performed GRIDPP9
This frame allows setting of the metadata value PFN register frame Frame to show and change the metadata schema of the catalog GRIDPP9
This frame shows the metadata value of the PFN Myfile This frame shows the attribute value of the PFN Shows the list of the files selected GRIDPP9
GAUDI/POOL Integration • Benefit from investment in LCG • Retire parts of Gaudi reduce maintenance. • Designed and implemented a new interface for the LHCb EventSelector. Criteria: • One or more “datasets” (e.g. list of runs, list of files matching a given criteria). • One or more “EventTagCollections” with extra selection based on Tag values. • One or more physical files. Result of an event selection is a virtual list of event pointers. GRIDPP9
Dataset Event 1 Event 2 … Event 3 Dataset Event 1 Event 2 … Event 3 Dataset Event 1 Event 2 … Event 3 File Event 1 Event 2 … Event N Files RAW2-1/1/2008RAW3-22/9/2007 RAW4-2/2/2008 … Dataset Event 1 Event 2 … Event 3 Event tag collctn Tag 1 5 0.3 Tag 2 2 1.2 … Tag M 8 3.1 Collection Set B -> ππ Candidates (Phy) B -> J/Ψ (μ+ μ-) Candidates … Physicist’s View of Event Data Bookkeeping Gaudi GRIDPP9
Future: Data to Metadata File catalogue holds only a minimal amount of metadata. LHCb deploys a separate “bookkeeping” database service to store the metadata for datasets and event collections. Based on central ORACLE server at CERN with query service through XML-RPC interface. Not scaleable, particularly for Grid, and completely new metadata solution required. ARDA based system will be investigated. Vital that this is development is optimised for LHCb and synchronised with data challenges. Corresponds to ARDA Job Provenance DB and Metadata Catalogue GRIDPP9
Metadata: Data Production Job.xml Data Production DIRAC Productiondone Production Jobs Bookkeeping File Catalogue Prod.Mgr Configuration • Build newconfiguration • Selection ofDefaults Information Flow GRIDPP9
Metadata: Data Analysis Job.opts Information Flow DIRAC User Job Bookkeeping File Catalogue Modify Defaults Select input data Configuration User Pick-up defaultconfiguration GRIDPP9
LHCb GANGA Status Alexander Soroko, Karl Harrison LHCb ATLAS + Alvin Tan Janusz Martyniak BaBar User Grid Interface. First prototype released in April 2003. To be deployed for LHCb 2004 Data Challenge. SUCCESS! GRIDPP9
GANGA will allow LHCb user to perform standard analysis tasks: Data queries. Configuration of jobs, defining the job splitting/merging strategy. Submitting jobs to the chosen Grid resources. Following the progress of jobs. Retrieval of job output. Job bookkeeping. GANGA for LHCb GRIDPP9
Job script JDL file Job Options file GANGA User Interface Submit job Grid/Batch System Gatekeeper Worker nodes Send job output File Transfer Storage Element Get Monitoring Info Get job output Send Local Client Ganga Job object Ganga Job object Ganga Job object Ganga Job object Ganga Job object Job Factory (Job Registry Class) Job Options Editor Data Selection (Input/Output Files) Strategy Selection Job Requirements (LSF Resources, etc) Database of Standard Job Options Strategy Database (Splitting scripts) GRIDPP9
Gaudi/Athena Job Definition Job Definition Job Registry Gaudi/Athena Job Options Editor Job Handling File Transfer BaBar Job Definition and Splitting Python Native Py Magda Python Root Gaudi Python PyAMI PyCMT Software Bus GUI • User has access to functionality of Ganga components through GUI and CLI, layered one over the other above a Software Bus • Software Bus itself is a Ganga component implemented in Python • Components used by Gangafall into 3 categories: • Ganga components of general applicability or Core Components (to right in diagram) • Ganga components providing specialised functionality (to left in diagram) • External components (at bottom in diagram) CLI Software Bus GRIDPP9
GUIs Galore GRIDPP9
DIRAC WMS Architecture GANGA GRIDPP9
Software/Component Server Software Cache Component Cache Remote Client Execution node Remote-Client Scheduler Grid/ Batch-System Scheduler Agent (Runs/Validates Job) Local Client JDL, Classads, Scheduler Proxy Dispatcher Job Requirements LSF Resources, etc NorduGrid Local DIAL DIRAC Other LSF PBS EDG USG Job Collection (XML Description) Derived Requirements Job Factory (Machinery for Generating XML Descriptions of Multiple Jobs) Scheduler Service Job-Options Template Dataset Job-Options Editor Dataset Selection Strategy Selection User Requirements Job-Options Knowledge Base Database of Standard Job Options Dataset Catalogue Strategy Database (Splitter Algorithms) Database of Job Requirements Future Plans Refactorisation of Ganga, with submission on remote client Motivation • Ease integration of external components • Facilitate multi-person, distributed development • Increase customizability/flexibility • Permit GANGA components to be used externally more simple 2nd GANGA prototype ~ April 2004 GRIDPP9
Future: GANGA Develop into generic front-end capable of submitting a range of applications to the Grid. Requires central core and modular structure (started with version 2 re-factorisation) to allow new frameworks to be plugged in. Enable GANGA to be used in complex analysis environment over many years for many users. Hierarchical structure, import/export facility, schema evolution, etc. Interact with multiple Grids (e.g. LCG, NorduGrid, EGEE…). Needs to keep pace with development of Grid services. Synchronise with ARDA developments. Interactive analysis? ROOT, PROOF GRIDPP9