270 likes | 406 Views
ATLAS Grid Activities Preparing for Data Analysis. Jim Shank. Overview. ATLAS Monte Carlo produciton in 2008 Data (cosmic and single beam) in 2008 Production and Distributed Analysis (PandDA) system Some features of the ATLAS Computing Model Analysis model for the US
E N D
Overview • ATLAS Monte Carlo produciton in 2008 • Data (cosmic and single beam) in 2008 • Production and Distributed Analysis (PandDA) system • Some features of the ATLAS Computing Model • Analysis model for the US • Distributed Analysis Worldwide: Ganga/PanDA and Hammercloud + other readiness tests • Tier 3 centers in the US
First ATLAS Beam Events, 10 Sept. 2008 Data Exports to T1s Throughput in MB/s Effect of concurrent data access from centralized transfers and user activity (overload of disk server) CERN Storage system overload. DDM worked. Subsequently we limited user access to the storage system. Number of errors
PanDA production (Monte Carlo Simulation/Reconstruction) 2008 Grouped by Cloud = Tier 1 center + all it’s associated Tier 2 centers
U.S. Production in 2008 More than our share—indicates others not delivering their expected levels
DDM : Data Replication BNL&AGLT2 ATLAS Beam and Cosmics data replication from CERN to Tier-1s and calibration Tier-2s. Sep-Nov 2008 Data replication to Tier-2s US Tier2s Datasets subscription intervals
DDM : Data replication between Tier-1s Functional Test. Tier-1-Tier-1s data replication status. FZK experienced problems with dCache. Data export is affected Tier-1-Tier-1s and prestaging data replication status. Data reprocessing. All Tier-1s Operational. Red : data transfer completion on 95% (data staging at CNAF)
PanDA Overview Workload management system for Production ANd Distributed Analysis • Launched 8/05 by US ATLAS to achieve scalable data-driven WMS • Designed for analysis as well as production • Insulates users from distributed computing complexity • Low entry threshold • US ATLAS production since late ‘05 • US analysis since Spring ’06 • ATLAS-wide production since early ‘08 • ATLAS-wide analysis still rolling out • OSG WMS program since 9/06
Panda/pathena Users 4 million jobs in last 6 months 473 users in last 6 months 352 users in last 3 months 90 users in last month 271 users with >1000 jobs 96 users with >10000 jobs
ATLAS Data Types • Still evolving…
Analysis Readiness Tests US T2 sites US
Ideas for a Stress Test (1) • Initiated by Jim Cochran (US ATLAS Analysis Support Group Chair). • Below is a summary of plans from Akira Shibata (March 10th). • Goal:Stress testing of the analysis queues in the Tier2 sites with analysis jobs as realistic as possible both in volume and quality. We would like to make sure that the Tier2 sites are ready to accept real data and analysis queues to analyze them. • Time scale: sometime near the end of May 2009. • Outline of this exercise: • To make this exercise more useful and interesting we will generate and simulate (Atlfast-II) a large amount of mixed sample at Tier2’s. • We are currently trying to define the jobs for this exercise and we expect this to be finalized after the BNL jamboree this week. • The mixed sample is a blind mix of all Standard Model processes, which we call "data" in this exercise. • For the one day stress test, we will invite people with existing analysis to try and analyze the data using Tier2 resources only. • We will compile a list of people who have the ability to participate. Nurcan Ozturk
Ideas for a Stress Test (2) • Estimate of data volume: A very rough estimate of the data volume is 100M-1B events. Assuming 100kB/event (realistic considering no truth info and no trigger info), this sets an upper limit of 100TB in total (split among 5 Tier2’s). This is probably an upper-limit from the current availability of USER/GROUP disk on Tier2 (which is in addition to MC/DATA/PROD and CALIB disk). • Estimate of computing capability: There are "plenty" of machines assigned to analysis though the current load of analysis queue is rather low. The computing nodes are usually shared between production and analysis and typically configured with upper limit and priority. For example MWT2 has 1200 cores and setup to run analysis jobs with priority with an upper limit of 400 cores. If production jobs are not coming in, the number of running analysis jobs can exceed this limit. • Site configuration: Site configuration varies among the Tier2 sites. We will compile a table showing configuration of each analysis queue; direct reading versus local copying, xrootd versus dcache, etc. We will compare the performance of queues based on their configuration. Nurcan Ozturk
Four Types of Tier 3 Systems • T3gs • T3 with Grid Services Details in next slides • T3g • T3 with Grid Connectivity details in next slides • T3w • Tier 3 Workstation • unclustered workstations...OSG, DQ2 client, root, etc • T3af • Tier 3 system built into lab or university analysis facility
Conclusions • Monte Carlo Simulation/Reconstruction working well world wide with PanDA submission system • Data reprocessing with PanDA working, but need further tests of file staging from tape. • Analysis Model still evolving • In the U.S., big emphasis on getting T3’s up and running • Analysis stress test coming in May-June • Ready for collision data in late 2009
PanDA Operation Data management ATLAS production T. Maeno Analysis
Analysis with PanDA: pathena Outputs can be sent to xrootd/PROOF farm, directly accessible for PROOF analysis Running the ATLAS software: Locally:athena <job opts> PanDA:pathena --inDS --outDS <job opts> Tadashi Maeno