1 / 13

STAR grid activities and São Paulo experience

STAR grid activities and São Paulo experience. NERSC-PDSF ~ 500 CPU ~ 150 TB SGE batch. São Paulo Test cluster 10 CPU, 3 TB SGE batch. BNL (2 sites) ~ 1100 CPU ~ 400 TB LSF batch. Upgrade project ~ 50 CPU and ~ 40 TB. The size of the raw data. STAR Au+Au event statistics (raw)

ally
Download Presentation

STAR grid activities and São Paulo experience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAR grid activities and São Paulo experience Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  2. NERSC-PDSF • ~ 500 CPU • ~ 150 TB • SGE batch • São Paulo • Test cluster • 10 CPU, 3 TB • SGE batch • BNL (2 sites) • ~ 1100 CPU • ~ 400 TB • LSF batch Upgrade project ~ 50 CPU and ~ 40 TB Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  3. The size of the raw data • STAR Au+Au event statistics (raw) • ~ 2-3 MB/event • ~ 20-40 events/s • Total 2004 Au+Au • 20-30 M events • ~ 65 TB • Cu+Cu run • ~ 70 M events @ 200 GeV • ~ 40 M events @ 62 GeV • ~ 4 M events @ 22 GeV • Plus all the p+p, d+Au and previous runs Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  4. The reconstruction, simulation, etc. • Reconstruction • Basically done in BNL • Au+Au is estimated to take 18 months (only 60% is complete) • Compare with 1 new run every year • A physics ready production needs ~ 2 production rounds (calibrations, improvements, etc) • Simulation and embedding • Done at PDSF • Simulation is transferred to BNL • STAR takes more data that it currently can make available for analysis Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  5. Analysis • Real data analysis is done in RCF • Simulation and embedding analysis is done in PDSF • Small fractions of datasets are scattered over many institutions mainly for analysis development @ PDSF Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  6. Why do we need grid? • If STAR wants to keep the production and analysis running in a speed compatible with data taking, other institutions need to share computer power • Next run STAR will take at least one order of magnitude more events than last year • The RCF/PSDF farm does not grow in the same rate • The user point of view • More time available for physics • Data will be available earlier • More computing power for analysis • Analysis will run faster • Submit the jobs from your home institution and get the output in there • No need to know where the data is • No need to log on RCF or PDSF • You manage your disk space Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  7. STAR grid • Three level structure • Tier0 sites (BNL) • Dedicated to reconstruction, simulation and analysis • Tier1 sites (PDSF) • Runs reconstruction on demand • Receives all the reconstructed files for analysis • Simulations and embedding • Tier2 sites (all other facilities, including São Paulo) • Receives a fraction of files for analysis • Eventually runs reconstruction depending on demand Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  8. Needs • Reconstruction and file distribution • Tier0 production • ALL EVENT files get copied on HPSS at the end of a job • Strategy implies dataset IMMEDIATE replication • As soon as a file is registered, it becomes available for “distribution” • 2 Levels of data distributions – Local and Global • Local • All analysis files are on disks • Notions of distributed disk – Cost effective solution • Global • Tier1 (all) and tier2 (partial) sites • Cataloging is fundamental • Must know where the files are • The only central connection between users and files • Central and local catalogs • Database should be updated right after file transfer • Customized scheduler • Find out where data is upon user request • Redirect jobs to cluster where data is saved • Job submission should not be random but highly coordinated with other users requests Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  9. What is STAR doing on grid? • For STAR, grid computing is EVERY DAY Production used • Data transfer using SRM, RRS, .. • We run simulation production on the Grid (easy) • Resource reserved for DATA production (still done traditionally) • No real technical difficulties • Mostly fears related to un-coordinated access and massive transfers • User analysis • Chaotic in nature, requires accounting, quota, privilege, etc … • Increase interest from some institutions • Already success under controlled conditions Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  10. STAR jobs in the grid Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  11. Accomplishments in the last few months • Full database mirrors over many institutions • Hold detector conditions, calibrations, status, etc… • Highly used during user analisys • File catalog and scheduler available outside BNL • User can query files and submit jobs using grid • Still some pitfalls for general user analysis • Integration between sites • Tools to keep grid certificates, batch systems and local catalogs updated • Library distribution automatically done using AFS or local copy (updated in a daily basis) • Full integration of the 3 sites (BNL, PDSF and SP) with OSG Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  12. User analysis in the grid • STAR analysis schema • 99% based on ROOT applications • User develops personal analysis code that process the data • Steps to properly submit analysis jobs in the grid • Select the proper cluster in the grid • Transfer and compile the analysis code to that cluster • Use the file catalog to select the files • Run the jobs (as many as necessary) • The node the job runs and the number of jobs is defined by the scheduler and depends on the cluster size, number of events and time to process each event. All this information is managed by the file catalog • Transfer the output to the local site • Many of these steps are not yet fully functional but progressing fast Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

  13. Current status and to do list • The GRID between PSDF and RCF works quite well • Mainly used for simulation jobs • São Paulo, BNL and LBL are fully integrated • Libraries, file catalog, scheduler, OSG, etc. • Being used to test user analysis under the grid • Activities for the next few months • Integrate the SGE batch system in the grid framework • Still some problems with respect to report right numbers to gridCat • Problems keeping jobs alive after few hours • Developments of authentication tools • RCF (BNL) and PDSF (LBL) are part of DOE labs • User analysis Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005

More Related