400 likes | 565 Views
ATLAS Production. Kaushik De University of Texas At Arlington LHC Computing Workshop, Ankara May 2, 2008. Outline. Computing Grids Tiers of ATLAS PanDA Production System MC Production Statistics Distributed Data Management Operations Shifts
E N D
ATLAS Production Kaushik De University of Texas At Arlington LHC Computing Workshop, Ankara May 2, 2008
Outline • Computing Grids • Tiers of ATLAS • PanDA Production System • MC Production Statistics • Distributed Data Management • Operations Shifts • Thanks to T. Maeno (BNL), R. Rocha (CERN), J. Shank (BU) for some of the slides presented here Kaushik De
EGEE Grid + Nordugrid - NDGF (Nordic countries) Kaushik De
OSG – US Grid Kaushik De
Tiers of ATLAS • 10 Tier 1 centers • Canada, France, Germany, Italy, Netherlands, Nordic Countries, Spain, Taipei, UK, USA • ~35 Tier 2 centers • Australia, Austria, Canada, China, Czech R., France, Germany, Israel, Italy, Japan, Poland, Portugal, Romania, Russian Fed., Slovenia, Spain, Switzerland, Taipei, UK, USA • ? Tier 3 centers Kaushik De
Tiered Example – US Cloud Kaushik De
Data Flow in ATLAS Kaushik De
Storage Estimate Kaushik De
Production System Overview Panda ProdDB submit DQ2 send jobs approved register files Clouds CA,DE,ES,FR,IT, NL,TW,UK,US (NDGF coming) Tasks requested by Physics Working Group Kaushik De
Panda • PANDA = Production ANd Distributed Analysis system • Designed for analysis as well as production • Project started Aug 2005, prototype Sep 2005, production Dec 2005 • Works both with OSG and EGEE middleware • A single task queue and pilots • Apache-based Central Server • Pilots retrieve jobs from the server as soon as CPU is available low latency • Highly automated, has an integrated monitoring system, and requires low operation manpower • Integrated with ATLAS Distributed Data Management (DDM) system • Not exclusively ATLAS: has its first OSG user CHARMM Kaushik De
Cloud Panda Tier 1 job input files storage output files job input files Tier 2s output files input files storage output files Kaushik De
Panda/Bamboo System Overview DQ2 Panda server LRC/LFC ProdDB job bamboo logger send log http pull https site B job pilot job https submit site A submit condor-g pilot Autopilot End-user Worker Nodes Kaushik De
Panda server clients DQ2 • Central queue for all kinds of jobs • Assign jobs to sites (brokerage) • Setup input/output datasets • Create them when jobs are submitted • Add files to output datasets when jobs are finished • Dispatch jobs Panda server https LRC/LFC PandaDB logger Apache + gridsite pilot Kaushik De
Bamboo prodDB • Get jobs from prodDB to submit them to Panda • Update job status in prodDB • Assign tasks to clouds dynamically • Kill TOBEABORTED jobs A cron triggers the above procedures every 10 min Bamboo Panda server cx_Oracle https Apache + gridsite cron https Kaushik De
Client-Server Communication • HTTP/S-based communication (curl+grid proxy+python) • GSI authentication via mod_gridsite • Most of communications are asynchronous • Panda server runs python threads as soon as it receivesHTTP requests, and then sends responses back immediately. Threads do heavy procedures (e.g., DB access) in background better throughput • Several are synchronous Panda Client UserIF Request mod_python serialize (cPickle) Python obj HTTPS (x-www-form -urlencode) mod_deflate Python obj deserialize (cPickle) Python obj Response Kaushik De
Data Transfer Panda DQ2 submitter • Rely on ATLAS DDM • Panda sends requests to DDM • DDM moves files and sends notifications back to Panda • Panda and DDM work asynchronously • Dispatch input files to T2s and aggregate output files to T1 • Jobs get ‘activated’ when all input files are copied, and pilots pick them up • Pilots don’t have to wait for data arrival on WNs • Data-transfer and Job-execution can run in parallel submit Job subscribe T2 for disp dataset data transfer callback pilot get Job run job finish Job add files to dest datasets data transfer callback Kaushik De
Pilot and Autopilot (1/2) • Autopilot is a scheduler to submit pilots to sites via condor-g/glidein Pilot Gatekeeper Job Panda server • Pilots are scheduled to the site batch system and pull jobs as soon as CPUs become available Panda server Job Pilot • Pilot submission and Job submission are different Job = payload for pilot Kaushik De
Pilot and Autopilot (2/2) • How pilot works • Sends the several parameters to Panda server for job matching (HTTP request) • CPU speed • Available memory size on the WN • List of available ATLAS releases at the site • Retrieves an `activated’ job (HTTP response of the above request) • activated running • Runs the job immediately because all input files should be already available at the site • Sends heartbeat every 30min • Copy output files to local SE and register them to Local Replica Catalogue Kaushik De
Production vs Analysis • Run on same infrastructures • Same software, monitoring system and facilities • No duplicated manpower for maintenance • Separate computing resources • Different queues different CPU clusters • Production and analysis don’t have to compete with each other • Different policies for data transfers • Analysis jobs don’t trigger data-transfer • Jobs go to sites which hold the input files • For production, input files are dispatched to T2s and output files are aggregated to T1 via DDM asynchronously • Controlled traffics Kaushik De
Current PanDA production – Past Week Kaushik De
PanDA production – Past Month Kaushik De
MC Production 2006-07 Kaushik De
ATLAS Data Management Software - Don Quijote • The second generation of the ATLAS DDM system (DQ2) • DQ2 developers M.Branco, D.Cameron, T.Maeno, P.Salgado, T.Wenaus,… • Initial idea and architecture were proposed by M.Branco and T.Wenaus • DQ2 is built on top of Grid data transfer tools • Moved to dataset based approach • Datasets : an aggregation of files plus associated DDM metadata • Datasets is a unit of storage and replication • Automatic data transfer mechanisms using distributed site services • Subscription system • Notification system • Current version 1.0 Kaushik De
DDM components DDM end-user tools (T.Maeno, BNL) (dq2_ls,dq2_get, dq2_cr) DQ2 dataset catalog File Transfer Service DQ2 “Queued Transfers” Local File Catalogs DQ2 Subscription Agents Kaushik De
DDM Operations Mode US ATLAS DDM operations team : BNL H.Ito, W.Deng,A.Klimentov,P.Nevski GLT2 S.McKee (MU) MWT2 C.Waldman (UC) NET2 S.Youssef (BU) SWT2 P.McGuigan (UTA) WT2 Y.Wei (SLAC) WISC X.Neng (WISC) • All Tier-1s have predefined (software) channel with CERN and with each other. • Tier-2s are associated with one Tier-1 and form the cloud • Tier-2s have predefined channel with the parent Tier-1 only. NG PIC RAL CNAF SARA TWT2 CERN T3 grif ASGC Cloud LYON Cloud ASGC lpc LYON Melbourne Tokyo Beijing TRIUMF FZK lapp Romania BNL BNL Cloud GLT2 T1-T1 and T1-T2 associations according to GP ATLAS Tiers associations. NET2 wisc MWT2 T1 WT2 T2 T3 VO box, dedicated computer to run DDM services SWT2 Kaushik De
Centralized and automatic (according to computing model) Simulated data AOD/NTUP/TAG (current data volume ~1.5 TB/week) BNL has a complete dataset replicas US Tier-2s are defined what fraction of data they will keep From 30% to 100% . Validation samples Replicated to BNL for SW validation purposes Critical Data replication Database releases replicated to BNL from CERN and then from BNL to US ATLAS T2s. Data volume is relatively small (~100MB) Conditions data Replicated to BNL from CERN Cosmic data BNL requested 100% of cosmic data. Data replicated from CERN to BNL and to US Tier-2s Data replication for individual groups, Universities, physicists Dedicated Web interface is set up Activities. Data Replication Kaushik De
Data Replication to Tier 2’s Kaushik De
You’ll never walk alone Weekly Throughput 2.1 GB/s out of CERN From Simone Campana Kaushik De
Subscriptions Kaushik De • Subscription • Request for the full replication of a dataset (or dataset version) at a given site • Requests are collected by the centralized subscription catalog • And are then served by a site of agents – the site services • Subscription on a dataset version • One time only replication • Subscription on a dataset • Replication triggered on every new version detected • Subscription closed when dataset is frozen
Site Services Kaushik De • Agent based framework • Goal: Satisfy subscriptions • Each agent serves a specific part of a request • Fetcher: fetches up new subscription from the subscription catalog • Subscription Resolver: checks if subscription is still active, new dataset versions, new files to transfer, … • Splitter: Create smaller chunks from the initial requests, identifies files requiring transfer • Replica Resolver: Selects a valid replica to use as source • Partitioner: Creates chunks of files to be submitted as a single request to the FTS • Submitter/PendingHandler: Submit/manage the FTS requests • Verifier: Check validity of file at destination • Replica Register: Registers new replica in the local replica catalog • …
Typical deployment Central Catalogs Site Services Site Services Site Services Kaushik De Deployment at Tier0 similar to Tier1s LFC and FTS services at Tier1s SRM services at every site, including Tier2s
Interaction with the grid middleware Kaushik De • File Transfer Services (FTS) • One deployed per Tier0 / Tier1 (matches typical site services deployment) • Triggers the third party transfer by contacting the SRMs, needs to be constantly monitored • LCG File Catalog (LFC) • One deployed per Tier0 / Tier1 (matches typical site services deployment) • Keeps track of local file replicas at a site • Currently used as main source of replica information by the site services • Storage Resource Manager (SRM) • Once pre-staging comes into the picture
DDM - Current Issues and Plans Kaushik De • Dataset deletion • Non trivial, although critical • First implementation using a central request repository • Being integrated into the site services • Dataset consistency • Between storage and local replica catalogs • Between local replica catalogs and the central catalogs • Lot of effort put into this recently – tracker, consistency service • Prestaging of data • Currently done just before file movement • Introduces high latency when file is on tape • Messaging • More asynchronous flow (less polling)
ADC Operations Shifts • ATLAS Distributed Computing Operations Shifts (ADCoS) • World-wide shifts • To monitor all ATLAS distributed computing resources • To provide Quality of Service (QoS) for all data processing • Shifters receive official ATLAS service credit (OTSMoU) • Additional information • http://indico.cern.ch/conferenceDisplay.py?confId=22132 • http://indico.cern.ch/conferenceDisplay.py?confId=26338 Kaushik De
Typical Shift Plan • Browse recent shift history • Check performance of all sites • File tickets for new issues • Continue interactions about old issues • Check status of current tasks • Check all central processing tasks • Monitor analysis flow (not individual tasks) • Overall data movement • File software (validation) bug reports • Check Panda, DDM health • Maintain elog of shift activities Kaushik De
Shift Structure • Shifter on call • Two consecutive days • Monitor – escalate – follow up • Basic manual interventions (site – on/off) • Expert on call • One week duration • Global monitoring • Advice shifter on call • Major interventions (service - on/off) • Interact with other ADC operations teams • Provide feed-back to ADC development teams • Tier 1 expert on call • Very important (ex. Rod Walker, Graeme Stewart, Eric Lancon…) Kaushik De
Shift Structure Schematic by Xavier Espinal Kaushik De
ADC Inter-relations Production Alex Read Tier 0 Armin Nairz Central Services Birger Koblitz Operations Support Pavel Nevski DDM Stephane Jezequel ADCoS Tier 1 / Tier 2 Simone Campana Distributed Analysis Dietrich Liko Kaushik De