1 / 27

STAR computing & resource overview

STAR computing & resource overview. S TAR A sian C omputing C enter Why, how, when SACC fits … J é r ô me LAURET STAR Software & Computing Leader. Outline. RHIC/STAR introduction Tier model, layer by layer Tier-0 and the S&C core mission, growth model; resource outlook

waite
Download Presentation

STAR computing & resource overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAR computing & resource overview STAR Asian Computing CenterWhy, how, when SACC fits …Jérôme LAURET STAR Software & Computing Leader Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  2. Outline • RHIC/STAR introduction • Tier model, layer by layer • Tier-0 and the S&C core mission, growth model; resource outlook • Tier-1 model – NERSC/PDSF • Tier-2 model – Prague NPI/ASRC • SACC & KISTI • Conclusion Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  3. The Solenoidal Tracker At RHIC Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  4. BRAHMS PHOBOS PHENIX STAR A RHIC experiment Scientific program in Heavy-Ion and Spin program • Heavy Ion: QGP • provide unique insight into how quark and gluons behaved collectively at the very first moment our universe was born. • Critical temperature Tc  2.1012 K • The sun core is ~ 107 K • Tc  170 MeV • Spin program • understanding how mass and spin combine into building blocks of nature • Versatile machine- Flexibility is key to understanding complicated problems • Polarized protons sqrt(sNN) = 50-500 GeV • Nuclei from d to Au (U), sqrt(sNN) = 20-200 GeV Unique & versatile machine Unique physics program Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  5. 2006 2005 2000 2015 2009 1992 Inaugurationfirst collisions CDR RHIC upgrades Announce of the Perfect Liquid (DOE, BNL). Full story video available.See STAR Near Future Physics Program, Nu Xu Program timeline & status • Detector upgrades • to address the more complex Physics • Machine / luminosity upgrade (Luminosity <~> more data) • First installment in 2007 • Staged DAQ upgrade for STAR (x100 in 2004, x1000 in 2008) • Data size • DAQ rate +runtime => sample size. • RHIC (from p+p to Au+Au) along LHC’s range (p+p or Pb+Pb) • STAR sustained 400-600 MB/sec, Atlas 300 MB/sec • STAR current program requires 250-300 MB/sec (22 weeks) in 2012 PB/year raw data sample recorded on tape Will double (luminosity) in the coming years Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  6. Tier-0 BNL/RACF STAR S&C Core team STAR S&C Leader Tier-1 NERSC/PDSF Tier-2 Wayne State University, NPI/ASCR Prague, … A top “core” team Organizations in self-sustained Tiers Service oriented Data flows from top to bottom Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  7. Tier-0 Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  8. Tier-0 – the heart of the experiment • RHIC Computing Facility (RCF) at BNL. Primary mission: Online recording of Raw data • Service • Long term (permanent) archiving and serving of all data • Production reconstruction of most (all) Raw data, provide resource for data analysis (one pass data reco) • not responsible for • providing the required resources for resource intensive simulated data needs • Providing additional user analysis passes beyond one pass • Has full support structure but staffed within limits of RHIC operation's budget • BNL S&C team = core team Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  9. S&C core team – a condensed overview • Mission • Ensures the data is taken, QA-ed, safely archived, reconstructed • Ensures proper resources at Tier 0 within funding guidance and mission statement • Provide an overall core activity & guidance, supervision, leadership • Reconstruction, simulation, calibration, infrastructure, database, visualization, … • Base support: User (local) support, end-user software support • Involves research and design to address future challenges • Mission-driven (detector project support, obsolescence, new techniques, …) • Others (Grid & Cloud, database scalability, data transfer, Virtualization) typically supported by • External (non-operation) funds • Synergistic activities with Computer Science (PhD, projects like OSG, other labs) • Team of 9 people Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  10. Growth • STAR is not a TAX based experiment (pay XX $ / authors to join) • You pay NOTHING, STAR counts on your willing participation • You may be supported to work at BNL • Has no room for “do all” and heavily relies on external workforce • No efforts for sub-systems, analysis support minimal • No support for Tier centers – no tax = fixed team / no growth possible • Communication • Weekly S&C meeting, Run preparation or calib meeting, Grid ops., … • Mailing lists, ticket system, Web pages, tutorials, … • STAR base principle: new projects => bring own workforce • Help spreading knowledge and burden over multiple sites • Recognize collaborators (scientific publication, public relation, …) Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  11. Resource outlook @ BNL • Computing model, requirements is the Physics program and objectives - 2nd Resource planning made in 2008 • Limited by Funding guidance • Knowledge from the past (empirical data) • Executive summary • Overall, program is doable within • Shortfalls in 2010 and 2011 in CPU Processing Power - 65 MSI2k in 2012. Numbers may change with RHIC revised plan Based on current guidanceproblems seen up to 2012 Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  12. Storage outlook Space needed foreach year Since analysis have the tendency to be on all years, integrated total matters. Our target model: light green Problems to achieve it until 2013 Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  13. Storage outlook Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  14. Tier-0 conclusions • Relatively thin team for large set of tasks • Concentrate on coordination • No built-in growth (with activities, with collaborators) • New projects / activities need to bring own workforce • Working at BNL is supported • Resource outlook • CPU difficulties until 2012 • Storage difficulties until 2013 • Need outsourcing to ensure continuous Tier-0’s operations Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  15. Tier-1&2 Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  16. What defines a Tier-1? • Resources • Provide some persistent storage – level of 10% or more • Provide backup copies for Tier-0 • Provision of 15% or more resource cycles to perform a core task (data production, intensive simulations) • MUST Allow redistribution of data to Tier-2 • Provides a significant added value to processing capabilities of STAR • Examples: Half of the analysis power AND/OR Provide support for running embedding, simulation AND/OR a fraction of data reconstruction • Typically, requires a support structure • Dedicated staff for maintenance, operations and upgrades • If support many users: user account, ticket, response, .. • A facility POC for STAR S&C management and local STAR POC Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  17. Tier-2 – everything else … • Would host transient datasets – requires only several 10th TB • Mostly for local groups, provide analysis power for specific topics (supplemental analysis passes) • Cycles (opportunistic) for simulation • Low requirement of Grid operation support or common project Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  18. NERSC/PDSF (Tier-1) • Target Role: sustain Monte-Carlo embedding process, provide 0.5 pass user analysis (any users) • STAR environment maintained locally • Software & upgrade, database – site is completely autonomous • Entirely relied on our provided instructions • Team subscribe to our list, liaison to the facility model • Provide supports for users as well (including help desk) • This is NOT a Tier-1 requirement • Done with a combination of ~ 1.5-2 FTE professional • Library and software support (user) ~ 30% • Team Lead (leveraged with other project activities) ~ 25% • System support (base 700 CPUs) ~ 30% • User support ~ 20% • Peripheral OSG work, indirect benefit to STAR ~ 50% • 1 FTE STAR data production and data management • Funds from local grants Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  19. NPI/ASCR (Tier-2) • Czech republic institute wanted to ramp up analysis and presence • Target: Local user analysis, Monte-Carlo simulations on spare cycles • STAR software installed at local site (by a student, minimal guidance) • Phase II – Institution wanted to have more visibility & expand • Proposed to fold in synergistic activities with Computer Science (CS) • Created HIGH Visibility activities via specialized/dedicated project (important for creating and presenting a ”program” vision to the local agency) • Scalla/Xrootd, HPSS data retrieval improvements, multi-site data transfer • Mutually beneficial projects (student research at BNL): one master, one PhD • Benefits • Local group self-sufficient & doubled in size since (more attractive to student than “log from remote”) • Participating universities highly praised on international communities & conference (Xrootd work) • STAR has the largest Xrootd data aggregation deployment See talk from Michal Zerola – Tier-2 / Xrootd / Data transfer Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  20. SACC STAR Asian Computing Center Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  21. Implications of the resource projections • Our approach • Encourage Tier-2 centers and consolidate science via access to data (early DataGrid adopter, early successes) • Create opportunities, leverage local workforce, harvest from other science fields • From principles to “need to” • “savings” are needed at the Tier-0 to meet the challenges until 2013 (NB: Preventing demanding Physics topics is not an option) • Be more efficient at the Tier-0 • Seek additional resources: Tier-1 • Squeeze out user analysis & maximize the use of Tier-2 facilities • Opened discussion with Pusan in 2006 • Approached KISTI in 2007 (same time than a tour of china) • Our message was well understood & received – Data redistribution Hub in Asia • KISTI joined STAR in mid-2008 as a full author’s member • LEE, Sang-dong, Ph.D Theoretical Particle Physics, Project Manager • KIM, Hyun-woo, Ph.D, High Energy Physics Hyunwoo Kim – KISTI’s effort to STAR’s data processing Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  22. STAR – An international collaboration • Geographically spread – census 2008 • 27% of our institutions in Asia • Distribution of thesis per region • Asia here includes India • 1/3rd the US, ½ Europe • We believe in great potentials in the region & would like to create more opportunities Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  23. KISTI • A “perfect” Tier-1 site like for STAR • Korean science, Korean groups would benefit from local computing • At the doorstep of Asia, good networking (Gloriad) • KISTI • Has CPU resources, has storage, has permanent storage • Mission of support science and aspire to become a world leader in science and technology • Hope for potentials for CS R&D, STAR provides plenty of topical activities a-la-NPI/ASCR with enhanced visibility for KISTI (if mutual interests) • MOU resource shaped. Requirements • CPU: Type=shared ; Profile (in kSI2k): 240 (2009), 800 (2010), 1120 (2011) • Storage: Type=dedicated ; Profile • Disk requirement (in TB): 100 (2009), 200 (2010), 500 (2011) • Tape requirement (in TB); 200 (2009), 500 (2010), 1000 (2011) • Networking: Type=WAN on-demand ; Requirement: 1Gbps (2010), 1.21 Gbps (2011) Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  24. Recognitions • Authors are all cited & Institutions are named on every publications • Activation period required (6months) – KISTI is about to appear • Acknowledgements have always recognized resources provided by facilities. Currently readsWe thank the RHIC Operations Group and RCF at BNL, and the NERSC Center at LBNL and the resources provided by the Open Science Grid consortium for their support. • The S&C project ensures Public Relation articles • From BNL to KISTI: Establishing High Performance Data Transfer From the US to Asia • ESnet Connects STAR to Asian Collaborators Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  25. Observation 1: calibration takes ~ 20% of one data reconstruction pass Observation 2: KISTI provides ~ 20% of the CPU power for one pass reconstruction Goals Migrate calibration support (automated) to KISTI (09 & 10) Use resources to speed up data reconstruction passes provide “first served“ datasets to Asian colleagues Provide storage for MuDST for Tier-2 in the Asian region Focused, well-defined role Data calibration is one of the most important part of the data production process The faster we can do it, sooner the datais available to all. Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  26. Feasibility? The laundry list … Technical • Network data transfer [outstanding] • Software installation and support • Data processing “tuning”, real-time Less technical  • Sustainable interest (Physics and the Computer Science aspects) • Are R&D projects associated to the processing of large data samples of interests? Because we have them … now! • Could Korean’s Computer Science activities be brought into the mix (leveraging opportunities through STAR)? • Interest in the region • STAR approach creates opportunities & counts on you to cease it • Support may span outside Korea (student, PhD, …) See talk from John Hover, Status of Data Transfer Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

  27. Conclusions • STAR @ RHIC: an exciting Physics program and a “live” (i.e. real and available now) data pool • With opportunities for both Physicists and Computer Scientists • Tier-0 funding profile – program is doable, some difficult years • Would benefit from supplemental resources until 2013 • SACC & KISTI • To be a success, support need to come along new projects • We are appreciative of the commitments • We are not quite there yet – More discussions and strategic planning needed? • Would play an important role for Korea, Asia, STAR and for KISTI (international visibility) • Creating unique opportunities for science Jérôme LAURET-STAR S&C LeaderSTAR Regional Meeting, Pusan/Korean, May 6-9 2009

More Related