1 / 33

Future of Distributed Production in US Facilities

Future of Distributed Production in US Facilities. Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa Cruz November 13, 2012. Background. Distributed production requires many different ATLAS specific SW components/applications

allen-johns
Download Presentation

Future of Distributed Production in US Facilities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Future of Distributed Productionin US Facilities Kaushik De Univ. of Texas at Arlington US ATLAS Distributed Facility Workshop, Santa CruzNovember 13, 2012

  2. Background • Distributed production requires many different ATLAS specific SW components/applications • Athena and Transformations – core software • ProdSys – task management system • AMI – Production Tags and Metadata • PanDA – job execution system • DQ2 – data management system • Monitoring of tasks, data and jobs • They utilize common tools like Globus, VDT, XRootD, Dcache, CVMFS, … deployed at our facilities Kaushik De

  3. Overview • Many distributed production components used in ATLAS are being upgraded after ~5 years of continuous use • In this talk we will focus on their evolution in 2013-2014 • Athena on many fronts: AthenaMP, Athena64, AthenaGPU, AthenaPhi, Athena event service • trf -> tf • DQ2 -> Rucio • ProdSys -> ProdSys II • PanDA -> CAF • PanDA -> BigData • New monitoring capabilities Kaushik De

  4. AthenaXX • Many future paths for Athena driven by hardware – will not talk about them here • Interesting topic for distributed production – event service • Basic unit of measurement in HEP is events – not bits, bytes or files • Multi-core is the new paradigm (same as the old one) • Caching technologies may be best optimized at event level • Started discussions during SW week for event service • Client-server architecture in Athena desirable long term • PanDA server with Athena client will be first step to try Kaushik De

  5. Job Transforms • Job transforms – trf – workflow wrapper around Athena • All production jobs use trf • Most major ATLAS workloads are supported • Including multi-step jobs • New workloads like overlay, FTK … are being added • Major changes underway • See recent talks by Graeme Stewart • https://indico.cern.ch/getFile.py/access?contribId=35&sessionId=19&resId=0&materialId=slides&confId=169697 • https://indico.cern.ch/getFile.py/access?contribId=7&resId=0&materialId=slides&confId=214562 • Highlights of future changes in next few slides Kaushik De

  6. Kaushik De

  7. Kaushik De

  8. Kaushik De

  9. Kaushik De

  10. Kaushik De

  11. Kaushik De

  12. Kaushik De

  13. Kaushik De

  14. https://indico.cern.ch/getFile.py/access?contribId=1&sessionId=5https://indico.cern.ch/getFile.py/access?contribId=1&sessionId=5 &resId=2&materialId=slides&confId=169697 Kaushik De

  15. Kaushik De

  16. Kaushik De

  17. Kaushik De

  18. Kaushik De

  19. What is ProdSys • Task management system • Interface to request production tasks • Generate jobs for execution by PanDA • Manage task completion • Consisting of many scripts • Web interface for task request • Bulk task submission interface • Auto generation of jobs from tasks • Scripts for task completion • Interacts with AMI and DQ2 • And add-ons • Task-list creation scripts developed by production managers • Task monitoring Kaushik De

  20. Current System ProdSys Jobs Production Manager Submits Tasks Bamboo PanDA Jobs User User Kaushik De

  21. What is ProdSys II • Split ProdSys into two parts • DEfT – task request and task definition • Some components will be taken from current ProdSys • JeDi – dynamic job definition and task execution • Integrated with PanDA (replaces Bamboo) • Will also be the engine for user analysis tasks • Need to work closely with Transforms & Rucio groups • All three systems should evolve together • Integration with monitoring • Will be planned from the beginning Kaushik De

  22. Future System DEfT JeDi Production Manager PanDA User User Kaushik De

  23. DEfT • Key features • Web UI for simplified interactive task request • Task request system based on physics requirements • Managers/users insulated from execution details • Deprecate/remove script based task submission • Error checking of task requests • Built-in authentication and approval mechanisms • Creates task according to a new simplified schema Kaushik De

  24. Tasks, Meta-tasks, Basket-tasks • New extensions to the concept of task • Task – basic unit • Input dataset -> Output dataset • Meta-task – chain of tasks, which will be auto-generated • Manager/user makes single request • Successive processing steps (transforms) created by DEfT • Intermediate steps in chain may be specified as transient • Basket-task – group of related tasks (eg. same tag) • Manager/user can define basket of tasks • Manager/user makes single request for execution • Ability to clone tasks, meta-tasks and basket-tasks • From pervious tasks, meta-tasks and basket-tasks • Or from predefined templates Kaushik De

  25. JeDi • Key features • JeDi will be core component of PanDA • Generate jobs dynamically from DEfT tasks • Jobs are defined to match execution environment and specified constraints(eg. number of cores, duration, file size, dataset size…) • Number of events varies per job • Jobs are not predefined with fixed number of events – key feature • PanDA responsible for optimal task execution • PanDA responsible for task completion • Auto-merging if requested • Data will be collected by PanDA to optimize job execution and completion (expanded concept of scout jobs) Kaushik De

  26. Common Analysis Framework • Task force to evaluate suitability of PanDA for a LHC common user analysis framework • Latest report: https://indico.cern.ch/getFile.py/access?contribId=7&sessionId=19&resId=1&materialId=slides&confId=169697 Kaushik De

  27. Kaushik De

  28. Kaushik De

  29. Kaushik De

  30. Kaushik De

  31. Kaushik De

  32. Kaushik De

  33. Conclusion • Many updates/improvements planned 2013-2014 • Some applications will be completely re-written • But based on past 5 years of LHC experience • Plans and teams are in place • Will lead to better software running at facilities • Waiting for current LHC run to end • Stay tuned for more Kaushik De

More Related