1 / 34

ALICE Computing TDR Questions and Answers

A summary of the milestones achieved in 2005 and the future plans for ALICE Computing, including releases, testing, and implementation of new technologies.

paez
Download Presentation

ALICE Computing TDR Questions and Answers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALICE Computing TDRQuestions and Answers Federico Carminati October 8, 2005 ALICE Computing

  2. Q1: Milestones • Have you met your 2005 milestones? ALICE Computing

  3. Q1: Milestones • MS1-May-2005: PDC05 - Start of event production (phase 1) • Started only in September to • Synchronise with SC3 • Improve the integration with LCG middleware and increase the usage of common components • Working with LCG via a combined taskforce toward a stable long-term solution for our distributed computing environment • Delayed May-September 2005 ALICE Computing

  4. Q1: Milestones • MS2-June 2005: AliRoot framework release • Released summer 2005 in time for the PDC05 • FLUKA is the standard transport model • Detector geometry via the ROOT Geometrical Modeller (TGeo) • Calibration and Alignment framework and the prototype for the Condition infrastructure implemented • Milestone has been met as planned ALICE Computing

  5. Q1: Milestones • MS3-June-2005: Computing TDR submitted to the LHCC • This milestone has been met as planned ALICE Computing

  6. Q1: Milestones • MS4-July 2005: PDC05 – Start of combined test with SC3 (phase 2) • Goal • Distributed production and merging of signal and underlying events and the subsequent reconstruction of the merged event • No additional developments or new services are required for this phase • Delayed to December 05 (MS1) ALICE Computing

  7. Q1: Milestones • MS5-September 2005: PDC05 – Start of distributed analysis (phase 3) • Goal • Non-organized distributed analysis of ESD data by many users • The delay has allowed us to further integrate with LCG • All components ready • User interface to the Storage Index (gShell), ROOT API for SI access and deployment of PROOF • Will be released to selected users end of 2005 or early 2006 • Tests are ongoing • Batch and interactive distributed analysis will be demoed at SC’ 05 • Delayed January 2006 ALICE Computing

  8. Q1: Milestones • MS7-December 2005: Condition infrastructure deployed • Initial user requirements collected • Prototype of the condition and tag infrastructure demonstrated • Further development according to user feed-back ongoing • First release scheduled for December • No delay foreseen for this milestone ALICE Computing

  9. Q1: Milestones • MS8-December 2005: Preliminary implementation of algorithms for alignment and calibration ready for all detectors • Alignment and calibration framework prototype available • Implementation of the detectors algorithm has started on the prototype • Good results obtained for the TPC • Milestone can be delayed ALICE Computing

  10. Q2: CDC VII • Page 12 mentions the next Data Challenge (CDC VII) but the schedule on page 77 does not seem to mention it; when is this planned to take place? Also, the goals for CDC VII include testing new network technologies; what are these technologies? ALICE Computing

  11. Q2: CDC VII • The planning could not be fixed at the time of the TDR • Network equipment to be purchased in 2005 after a large market survey • Planning discussed and agreed with IT • Nov-Dec '05: initial tests of the network equipment in the computing centre • April '06: generation of data in the DAQ at the experimental area; recording in the computing centre • Software will include • DATE V5, AliRoot, ROOT data formatting, algorithms from HLT, Linux SLC3 in 2005 (possibly SLC4 in 2006) • CASTOR2 • New technologies to be tested • 10 Gbit Ethernet router and Fibre Channel network for the storage ALICE Computing

  12. Q3: Tag and Grid Collector • In Chapter 2, the TAG and Grid Collector indexing mechanism looks very similar to functionalities provided by a relational DB. What are the reasons for your choice? Do you have proof that this system is scalable? Do you have quantitative results on achieved performances from the DCs? ALICE Computing

  13. Q3: Tag and Grid Collector • GC is based on compressed bitmap index technology • We do not need most of the RDBMS functionality • e.g. concurrent read and write access • Large gain in performance over classical RDBMS queries • Great benefits from a single I/O technology, i.e. root files • Scalability and performance demonstrated by STAR • We have performed several standalone tests • Suited to ALICE needs • Confirm STAR performance results • Framework developed in collaboration with ROOT and STAR • GC and Index Builder will be included in the PDC 06 ALICE Computing

  14. Q3: Tag and Grid Collector • From J.Wu presentation at September 2005 ROOT workshop • http://agenda.cern.ch/askArchive.php?base=agenda&categ=a055638&id=a055638s2t10/transparencies ALICE Computing

  15. Q4: PROOF requirements • Page 30: what are the hardware and software constrains associated with the PROOF system in the remote computing centres? Could you please give more details on the required architecture in an analysis centre? ALICE Computing

  16. Q4: PROOF requirements • Software constraints • All components (ROOT, proofd, proofserv and xrootd) are part of ROOT • Hardware constraints • Dictated by the target performance of the system • PROOF scales linear up to a few hundred nodes • Nodes can simply be added to increase the performance • Commodity components • High end CPU's (top end P4 or AMD64) • Few hundred GB SATA disks • Few GB's RAM and GB Ethernet • A clusters of a several tens of nodes can already deliver considerable performance for ad-hoc analysis • ALICE plans to instrument the CERN AF as a PROOF-enabled cluster • Intention to test a large PROOF cluster with the SC 4 / ALICE PDC 06 ALICE Computing

  17. Q4: PROOF requirements • Already in used at PHOBOS, See M.Ballintijn, presentation during the ROOT workshop • http://agenda.cern.ch/askArchive.php?base=agenda&categ=a055638&id=a055638s2t5/transparencies ALICE Computing

  18. Q5: Monarc and Cloud models • How do you envisage the migration from the MONARC model to the cloud model as mentioned on page 63? Is this migration compatible with other LHC experiments' plans? The computing model described in the TDR seems to fit very well with the hierarchical model; why do you think that the cloud model is better? ALICE Computing

  19. Q5: MONARC and Cloud models • ALICE still sees the cloud model as appealing • Redundancy and resilience to failures • Flexible in optimising the usage of resources • Tested successfully in PDC04 • The computing model follows a more “hierachised” pattern • The LCG infrastructure is developing in a hierarchical fashion • FA’s plan around the large T1’s • Resource evaluation and planning is easier • Strongly recommended to us by the LHCC during the Computing Model Review in January 2005 who suggested not to rely entirely on a cloud-enabled GRID and thus adopt a stricter hierarchical model ALICE Computing

  20. Q6: First pass reconstruction • Page 65 states the first pass reconstruction will be done on Tier 0 at CERN for both pp and AA data. As a consequence the AA reconstructed data will not be available before at least 4 months after data taking. This delay is rather uncomfortable for fast physics feedback. Is it impossible to foresee a faster distributed first pass reconstruction on Tier 1's, maybe not in the first year when data will be scarce, but for the following years? ALICE Computing

  21. Q6: First pass reconstruction • After HI run the T1’s will be busy to reprocess previous years’ data • pp and HI reconstruction, organised analysis • “Pushing” more computation outside CERN would penalise ongoing physics activities • And it will critically depend on the performance of the Grid! • In the current model enough time is available to provide feedback for the running conditions of the next heavy-ion run • First significant results will be obtained from a subset of the data allowing for early discovery • One of the goal of the CERN AF cluster ALICE Computing

  22. Q7: Tier2 bandwith • The computing model described implies reconstruction will be mainly done at Tier 1 while analysis will be done at Tier 2. Page 70 says the data at Tier 2 will be copied (and hence deleted to make space) as required. However, the implications of this in terms of extra bandwidth do not seem to have been included fully; can they be estimated? ALICE Computing

  23. Q7: Tier2 bandwidth • T1’s do second and third reconstruction passes and organised analysis • T2’s do MC generation / reconstruction and non scheduled analysis • T2’s export the data to near T1 MSS and keep the ESD/AOD • One copy of the current reconstruction pass till the new one is produced • The distributed analysis splits jobs to maximise data locality • Minimisation of additional data traffic • This has been taken into account in the estimation of the network traffic • It depends on our disk space requirements at T2’s to be satisfied ALICE Computing

  24. Q8: MC data movement • Similarly, page 69 says a copy of every MC event will go to Tier 1 (for reconstruction) and then these are copied back to the Tier 2 where theywere produced. There will be further MC events moved around as they are analysed; while signal can be easily produced for a particular analysis, the large numbers of generic background events needed will have to be pooled and hence will be needed at all Tier 2s. Has the bandwidth required been included? If not, what would this add in terms of rate in and out of Tier 2 as well as disk space needs at Tier 2? ALICE Computing

  25. Q8: MC data movement • T2’s do MC generation and reconstruction • Underlying events are generated and shipped to T1’s MSS • Signal events are generated on-the-fly and merged with underlying events from the “local” pool • MC ESD’s generated by the T2’s are shipped to T1 MSS but also kept at T2’s for subsequent analysis • T2-T2 traffic should be very low • This has been taken into account for the network traffic estimation ALICE Computing

  26. Q9: Data volume in 2007 • Page 62 states the assumptions for 2007 are 40% pp and 20% AA of a standard year. However, the event rate will be kept at nominal by loosening the triggers, which will allow studies of them. What are the financial implications of this, in terms of resources at the Tier 0 (or elsewhere) which need to be purchased in time for the 2007 run rather than delayed until the 2008 run, when they will be cheaper? How much would be saved by e.g. a factor of two reduction in the event rate, which would still give large amounts of data to debug the detector with? Page 72 states the full rate of looser triggers is needed to allow the discovery physics to be done, which implies the triggers are not very selective and many important events will be lost with the nominal settings. This needs further justification; specify the critical physics which is essential to do during the 2007 run which could not be done with e.g. half the event sample. ALICE Computing

  27. Q9: Data volume in 2007 • Event Samples & Triggers presented to LHCC • PPR vol 1 (CERN/LHCC 2003-049) • LHCC special session June 2002 (CERN/LHCC 2002-023, http://sks.home.cern.ch/sks/LHCbeamreq.ppt) • Large cross section processes, global event properties • Measured in MB or central events (5-10% of MB) • First physics (e.g. multiplicity distributions) can be extracted from several hundreds of events • Rare probes (e.g. -meson pt, charm mesons) or signals with a very small background ratio (e.g. -mesons at low pt, thermal photons) • Require many 107 MB and central events • Rare events with specific triggers in ALICE • J/ψ or  decays in the central detector and the muon arm • high pt photons, jets etc... • Require selective triggers, good DAQ lifetime and maximum integrated luminosity -- will take a longer time to address • Given the multiplicity ratio we need two orders of magnitude more MB pp than heavy-ion events for comparable statistical errors (signal dependent) ALICE Computing

  28. Q9: Data volume in 2007 • Maximum rate limited by the SDD dead time to ~500 Hz • Can be 1 kHz (reducing SDD sampling rate) for pp • DAQ bandwidth (1.2 GB/s) limits HI rate • 100 Hz of MB Pb-Pb or 25 Hz of central, assume dN/dych = 4000 for central Pb-Pb • DAQ and trigger guarantee a good lifetime for rare trigger and fill the bandwidth with MB • We are not rate-limited • DAQ bandwidth is a compromise between technical / financial constraints and running time to accumulate a few 107 Pb-Pb (pp) MB and central events (109 pp) ALICE Computing

  29. Q9: Data volume in 2007 • Short setup time in 2007 with cosmic triggers and single beams • Main systems (e.g. trigger scintillators and TPC) ready for MB events soon after • RHIC collected physics data within days of the first collisions and published less than four weeks later. We intend to do at least the same! • Initial low luminosity pp of particular interest • At L < 1029 we have no pile-up in the TPC -- cleaner and smaller events • We intend to take MB events (pp and if possible Pb-Pb) at the maximum possible DAQ rate for physics analysis • Even at ultra-low luminosities, rate will be limited by the experiment, not the machine • To limit the event rate is not an efficient usage of a detector expensive to build and a machine expensive to operate • This is a one-off chance • The LHC luminosity (and therefore the event pile-up) will increase • To limit the rate in 2007 would be harmful to the quality of the physics • The number of events, and the CPU requirements, depend on the initial LHC running time (pp and HI) • At 500 Hz we can collect 4 x 108 MB pp events (40% of a standard year) in ≤ 106 seconds ALICE Computing

  30. Q10: AliEn • How much of the AliEn software is (or will become) common with LCG is not made clear. What is the overlap of these efforts and how do they mutually coordinate between them? How much effort is it to maintain AliEn? How much LCG code is foreseen to be incorporated into AliEn over the next two years? When is it expected that AliEn will be phased out completely? ALICE Computing

  31. Q10: AliEn • Coherent set of modular services • Used in production 2001-2004 • Common Grid projects progressively offered the opportunity to replace some AliEn services • Consistent with the plan announced since 2001 by ALICE • This will continue as suitable components become available • ALICE is taking active part in the definition and testing of these components • Whenever possible, we will use “common” services • AliEn offers a single interface for ALICE users into the complex, heterogeneous (multiple grids and platforms) and fast-evolving Grid reality ALICE Computing

  32. Q10: AliEn • AliEn interfaces to the LCG services • LCG data management components (LFC, SRM) • Workload Management System (Resource Broker) • gLite Data Management components (FTS) • Virtual Organisation Managment System (VOMS) • Common authentication model (GLOBUS) • Discovery service (planned) • Discussed by the BS WG, coordinated by the ALICE-LCG-TF, tested in the DC • Interface with ARC is in progress, we are discussing with OSG • The services provided by AliEn are • ALICE job database and related distributed tools and services • ALICE file and dataset catalogue and related distributed tools and services • ALICE specific monitoring services • Essential components for distributed data processing • Their functionality is ALICE-specific and not found elsewhere • They are integral part of the ALICE Computing Environment • We do not foresee to phase out these elements ALICE Computing

  33. Services for SC3 timeframe ALICE Computing

  34. ALICE Computing

More Related