710 likes | 844 Views
ALICE: Offline Planning and personnel resources. LHCC Manpower Review of Computing September 3, 2003. Questions to be answered. Profile of available and required manpower at CERN / Regional Centers / Institutes Other resources existing and potential
E N D
ALICE: Offline Planning and personnel resources LHCC Manpower Review of Computing September 3, 2003 ALICE : planning & resources
Questions to be answered • Profile of available and required manpower at CERN / Regional Centers / Institutes • Other resources existing and potential • Computing elements which will not be provided in case the required manpower and resources are not available • Measures of progress in producing necessary software • Management tools to track the progress • Verification of the quality of the LCG software ALICE : planning & resources
Foreword • Lack of personnel in LHC computing (experiment & common HW/SW infrastructure) has been emphasized by LHC Computing Review (2001) and judged “extremely worrying” • “CERN and the Collaborations together must do all that they can to provide the HR that are needed” for Core Software development • The shortage has been alleviated for the LCG project by influx of computing professionals funded by member countries • No such mechanism exists yet for experiments where the personnel shortage remains a problem • ALICE has re-profiled the planning • The data to be shown represent a bare minimum below which the readiness for data processing cannot be guaranteed. ALICE : planning & resources
Menu : Planning & Resources • ALICE Offline organization & management • Strategy for the Offline project, DC & milestones • Personnel ressources : available and requests • Answer to questions & conclusions ALICE : planning & resources
Organizatio n • Offline project mandate : • Prepare software and computing infrastructure for experiment’s data processing (+DAQ, +HLT projects); • Provide and maintain a complete infrastructure for simulation, reconstruction and analysis already during construction phase; • Offline personnel for software developments: • Core Offline project : minority, full time, located at CERN; • Detector projects : most of the personnel, part time (preparation of apparatus), located in collaboration institutes; • LCG provides common hardware and software infrastructure for LHC computing. Strict coordination required to make the best usage of the personnel available. ALICE : planning & resources
Organization US Grid Coordination Management structure • LCG • SC2 • GDB • POB DAQ EU Grid Coordination Offline Board Software projects Int. Comp. Board Detector projects Regional Tiers HLT Project Leader & Deputy Planning Coordination Resources Coordination Production Environment Coordination Framework & Infrastructure Coordination Simulation Coordination Reconstruction & Physics Coordination Core Offline ALICE : planning & resources
Core Offline Work Packages • Framework and infrastructure coordination • Simulation coordination • Reconstruction and physics coordination • Production environment coordination ALICE : planning & resources
Single line of development & complete transition to C++ in 1998 Organization Management structure • Light weighted, single structure • Efficient use of available personnel • High adaptability to rapid changing technology • Merge framework developer (services providers) & physics algorithms developer (consumers) • Maximize communication • Economy of personnel (polymorphism of software experts) • Rapid feedback to users requirements ALICE : planning & resources
Planning Strategy • Dynamic management of the work schedule • Develop a long term software infrastructure • Maintain the infrastructure in working state during detector construction • Constraints • Depend on the planning of external projects (LCG, EDG, EGEE) • Most developers refer to detector projects • Take advantage of latest developments in fast evolving technology • No personnel available for in depth planning activity • Majority of personnel in Core offline project is temporary and with unpredictable skills Light weighted and opportunistic strategy with flexible data challenges as high level milestones ALICE : planning & resources
Core team @ CERN • A choice, not a necessity • Need for a strong and centralized team of experts • To facilitate coordination in all detector projects and all regional centers • CERN, more than other ALICE groups, has the critical mass of people with the right skills • Benefit from co-habitation with ALICE management • And with LCG management • Benefit from the attraction CERN exercises on young people with the right profile ALICE : planning & resources
Development strategy • Minimize the effective amount of development • Chose mature and well tested products • ROOT : Common HEP solution for: Data persistency at the file level, interface to various libraries, visualization, graphical user interface, virtual Monte-Carlo, geometrical modeler • AliEn : The ALICE distributed computing environment all made with Open Source components based on Open Standards; 2 FTE for development, 0.5 for operation, in production since 2002 • Reduce staff and rely on temporary personnel • However there is a threshold for staff • Delegate well identified and modular packages to teams outside Core group • Detector data base • EDG/EGEE test bed ALICE : planning & resources
Data Challenges • Stress-test the ALICE data model, DAQ hardware and software infrastructure with prototypes of increasing complexity until 2007 objectives are reached. • Computing DC: record HI data at 1.2 Gbytes/s and export quasi online processing outside CERN • Physics DC: provide the infrastructure for organized Monte-Carlo production and world-wide random data-analysis ALICE : planning & resources
Computing Data Challenge • ALICE & IT : • Assess the MS requirements and evaluate available products (1998); • Evaluate functions of DAQ, Offline, HLT projects ; • Large-scale high-throughput distributed DC (4) to : • Prototype the DAQ, Offline, HLT computing systems • Verify their integration • Assess technologies and computing models • Test hardware and software components in realistic environment • Achieve an early integration of the overall computing infrastructure ALICE : planning & resources
ilestones M ALICE : planning & resources
Physics Data Challenge • Objectives : • Prototype and test scalability of the components needed to simulate, reconstruct, and analyze data on distributed computing resources • Three interlinked components : • ROOT • AliRoot • AliEn ALICE : planning & resources
Milestones * Fraction of events simulated in one year of standard data talking ALICE : planning & resources
PDC-III Resources estimate • Simulation • 105 Pb-Pb + 107 p-p • Distributed production, (partial) data replication at CERN • Reconstruction and analysis • Data source is CERN : 5106 Pb-Pb + 107 p-p • Reconstruction at CERN and outside depending on resource availability • Resources (CPU and Storage) • 2004 Q1: 1354 KSI2K and 165 TB • 2004 Q2: 1400 KSI2K and 301 TB • Bandwidth • Simulation in 2004 Q1 • ~90 TB will be shipped to CERN in about 2 months ~10 days using 10% of the CERN bandwidth. ALICE : planning & resources
PDC-III resources profile ALICE : planning & resources
PDC-III resources USA quota to be confirmed • Details in the “ALICE Data Challenges” paper taking into account • Results of previous PDC • Estimation of simulations in a standard year (2009) • Storage: 200TB must be kept beyond the PDC end!! • The numbers indicating the LCG resources for ALICE assume simultaneous use of the resources by all the experiments! • A dynamic resource allocation would easily solve the deficit ALICE : planning & resources
Tracking progress • Milestones set by the needs to prepare the Physics Performance Report • Full and fast simulation • Detector reconstruction • Global reconstruction • Progress monitored by Physics DC • Central coordination at CERN (architect, librarian, multi-platform compatibility) • Offline board takes the decision on framework evolution and review progress • Developers implement during Offline week • Code reviewed by experts ALICE : planning & resources
Verification of LCG software quality Grid technology area ALICE : planning & resources
Verification of LCG software quality Grid deployment area ALICE : planning & resources
Verification of LCG software quality Fabric area ALICE : planning & resources
ALICE Offline Planning Today ALICE : planning & resources
Personnel Profile (task oriented) • 4 permanent staff persons • Profile is build up with the assumptions that temporary personnel is NOT replaced* • Evolution reported since 1998 * Unrealistic scenario to emphasize fragility of the structure ALICE : planning & resources
Personnel Profile (task oriented) - 1/5 ALICE : planning & resources
Personnel Profile (task oriented) - 2/5 ALICE : planning & resources
Personnel Profile (task oriented) - 3/5 ALICE : planning & resources
Personnel Profile (task oriented) - 4/5 Summary Core Offline team ALICE : planning & resources
Personnel Profile (task oriented) - 5/5 • Long build-up time • Must sustain plateau after 2003 ALICE : planning & resources
Personnel Profile (post oriented) • 4 permanent CERN staff • Temporary CERN personnel (no replacement assumed*) • Staff LD • Technical and Physics students • CERN Fellows • Temporary CERN Project Associates (direct contribution from collaboration institutes + ALICE CERN exploitation budget ; no replacement assumed* ) * Unrealistic scenario to emphasize fragility of the structure ALICE : planning & resources
Personnel Profile (post oriented) - 1/5 • Mostly temporary personnel • Substantial contribution from collaboration institutes • ROOT effect in 1999, AliEn effect in 2003 ALICE : planning & resources
Personnel Profile (post oriented) - 2/5 • Only 25% permanent personnel • More than 60% are short/medium term personnel ALICE : planning & resources
Out-sourced projects - 1/3 • Detector DB by Physics Department and Computer Science Department @ Warsaw University : a single DB (economy of personnel) common to all detectors in the experiment ALICE : planning & resources
Out-sourced projects - 2/3 • EDG testbed validation and participation in various GRID projects by ALICE/Italy, ALICE/US, and the EDG/DataTAG project; to be continued with EGEE ALICE : planning & resources
Out-sourced projects - 3/3 • AliEN: basis of the ALICE distributed computing infrastructure : Coordination and main development by Core Offline group but several specific sub-tasks delegated to individuals at remote places ALICE : planning & resources
Ressources summary • Distribution of personnel for common offline activities • About 40% of the work is distributed outside CERN ALICE : planning & resources
HLT Software • Only personnel working on algoritms and simulation in collaboration with Offline project • Part of missing personnel should come from PhD students ALICE : planning & resources
GANIS ???? LCG projects in application area • ALICE has already made most of choices for critical issues (persistency, data DB, tracking, geometry descriptor, distributed computing, etc…) • Does not need to rely on common LCG applications • However ALICE contributes to common developments : • To come : AliEn coupled with PROOF as generic architecture for LCG interactive analysis ALICE : planning & resources
Other ressources • UE project : one person to work full time on EDG for ALICE • Industry : • Do not remember who???? : Code checker • Ericson : AliEn what exactly ???? • Nasa : one person full time on the Virtual Monte-Carlo ????? ALICE : planning & resources
CORE CERN team Detector Groups Offline in detector projects - 1/3 • AliRoot: An object Oriented framework which directly uses ROOT and provides: • Many event generators • Tracking using Virtual Monte-Carlo • IO infrastructure • Steering functionalities • Global reconstruction • Detector (13) tracking and reconstruction • Analysis ALICE : planning & resources
Offline in detector projects - 2/3 • No full-time dedicated developers • Schedule defined by global milestones (DC) • Planning is task oriented rather than personnel oriented ALICE : planning & resources
Offline in detector projects - 3/3 Summary Total 39.7 37.3 35.8 35.8 Needed 8.6 13.3 14.4 14.4 ALICE : planning & resources
Personnel resources in Offline project • About 16% of the personnel at CERN, the remainder in collaboration institutes, no experiment dedicated personnel at regional centers. ALICE : planning & resources
Personnel resources in Offline project COLORS ! CERN (16 %) OUTSIDE INSTITUTES (84%) ALICE : planning & resources
How to mitigate the lack of Personnel • The ALICE off line project is committed to provide the collaboration with the adequate software to take and analyze data starting 2007. • The project has already adapted its strategy to the lack of personnel and aims toward a bare minimum which enables to fulfill its tasks. • The Core team cannot afford to lack more personnel without putting in danger the success of its goals. • The severe lack of personnel in the detector projects will translate in lack of readiness in terms of accuracy in the algorithms and in lack of availability of categories of algorithms. • Such a deplorable situation will have a negative impact on the quality of physics results. ALICE : planning & resources
ALICE priorities - 1/4 • Core Offline group at CERN : • Less than 1/4 of personnel in Core Offline group at CERN are permanent • More than 50% are temporary personnel • Dependence on availability of short term CERN positions • Uncertainty on renewals • Loss of knowledge -- difficulty of knowledge transfer • Difficulty to cover key positions with people with the appropriate profile • Competition within ALICE in a fixed quota situation ALICE : planning & resources
ALICE priorities - 2/4 • Core Offline group at CERN : • Have at least 1/3 of long-term personnel, limit use of fellows and students to 1/2, without changing the target number of FTEs • Ensure the covering of key areas by converting two area coordinators (Production Environment, Framework & Infrastructure) now on temporary positions into CERN permanent staff • Alleviate the “volatility” of Core Offline Team with at least two long term (6 years, LD-like) positions at CERN to replace short term ones (Detaching LCG personnel to ALICE would be a natural solution) Which profile/task???? ALICE : planning & resources
ALICE priorities - 3/4 • Core Offline group at CERN : ALICE : planning & resources
ALICE priorities - 4/4 • Detector Offline at collaboration institutes : • About 10 FTEs missing in the subdetector projects for software developments • This is a responsibility of the Institutes in charge of the subdetector projects • We are working hard to find these people • Additional resources from funding agencies will have to be discussed case-by-case ALICE : planning & resources