150 likes | 163 Views
This comprehensive physics support service includes database deployment, persistency frameworks, and coordination efforts for ATLAS, CMS, LHCb, and other experiments. Key features include distributed deployment, service level management, and manpower allocation. Collaboration with CERN research teams and integration with experiment databases ensure efficient operations for physics analysis systems. The project also focuses on streamlining service provision, alerting, and monitoring. Oracle stream optimization and replication options are evaluated as per experiment needs.
E N D
PSS – Physics Services SupportGOALS 2007 (all services are part of LCG) • Physics Database Service at CERN • ATLAS, CMS, LHCb, LCG (+ ALICE, COMPASS, HARP,…) • Distributed Deployment of Databases (3D) • Coordinating the global effort • Persistency Framework • POOL, CORAL, COOL • ARDA/EIS • Dashboard • Analysis Systems (e.g. GANGA) • Experiment integration & support Personnel funding: CERN ‘normal’ - 10LCG (various) - 12EGEE (NA4, JRA1,SA1,overh.) - 9 Openlab (Oracle) - 2 Other (Doct, India,) - 3 Tightlycoupled
Physics Database Service • Finalize offline database requirements for LHC start-up • Conditions deployment ramp up (ATLAS, CMS & LHCb) • Service level, Piquet service • Implement outcome of task force • Streamline contact to experiment DB responsibles • ATLAS DB project, CMS DB core team(?), LHCb-IT meeting • Complete integration of 3D procedures • Automated alerting, GGUS integration, SAM tests • Continue common service meetings with DES • RCF/PDB foreseeable future • xx FTE, (1 MCHF in FIO - please check budget) • Missing 1 FTE to match service growth and pending requests
3D Project • Continue to coordinate of LCG database setup and evolution • 3D service responsibilities (and associated manpower) moving to physics database service team • Streams monitoring and integration with Dashboard / SAM • Insure service provisioning at 3D phase 2 sites • Delays at NDGF and PIC • Experiment DB production planning with the sites • Resource reviews, DB workshops, Oracle licenses • Oracle streams optimization with Oracle development and document resulting site configuration • Frontier/Squid – continue evaluation & optimization • Evaluate additional replication options (transportable table spaces - logical standby) - only if required by experiments • Role of ATLAS TAG database • Manpower: xx FTE
Persistency FrameworkPOOL & CORAL • POOL • Object relational mapping completed • Baseline for CMS conditions, used by ATLAS conditions • Schema evolution (CMS use cases) • Maintenance mode • CORAL • Common database layer now moving into distributed deployment • DB service lookup, authorization, monitoring, connection pooling, retry and failover do be deployed at the large scale • Successful collaboration with RRCAT for LFC integration and Python interface • Frontier client integrated – needs further consolidation • baseline for T0->1->2 CMS, successfully used in CSA06 • ATLAS evaluating for T1->2 • Distributed deployment by experiments will require adaptations/improvements of lookup and failover policies
Persistency Framework 2007 • COOL (ConditionsDB) • Baseline conditions implementation for ATLAS and LHCb • Replication tests with experiments and 3D sites promising • Manpower shortage in COOL after two active contributors from ATLAS left • LCG AA has been asked by internal review to rebalance until replacement manpower becomes available from ATLAS • Caused back-log of functional, performance and scalability improvements requested by ATLAS • Persistency Framework Totals • Distributed Production milestones established for ATLAS (Feb’07) and LHCb(April’07) • No new developments - focus on stability for distributed deployment • RCF/APPS (2005 decision) foreseeable future • x FTE (+ .5 FTE ATLAS + 0.2 LHCb + 1 FTE LCG )
Open Issues (old and new) • Experiment requirements still in flux • Ramp up from grid jobs still to come • New 3D service responsibilities on DBA team • Openlab funded LD replacement of 3D fellow moves with it • Project continues DB co-ordination between experiments and T1 sites • MySQL high availability prototype has been implemented based on standard Physics DB Services hardware • Still interest for resource broker backend DB by GD? • Estimated effort if service: ~1 FTE • Online databases are still an open issue • DBA courses and OCP training have been organised for experiments • Provide setup docs and temporary h/w for experiment scaling tests • No service/manpower available outside CC • Clarify IT policy on database support (if in B513)
Issues - persistency framework • Experiment contributions down to some some few 10% level • PF shrunk to planned size for maintenance • Basically all PF developers are concentrated in PSS • All will face FT review next year • Loosing PF expertise for this central part of LCG s/w stack will almost certainly cause some disruption during ramp up next year • SEAL maintenance vs. migration • LCG AA strategy for SEAL still being defined • POOL/CORAL/COOL are based on SEAL component model, plugin loader and message service • (Almost) no manpower assigned to these • Agreement being discussed
ARDA/EIS • Two main activities • ARDA is a collection of projects • Includes supporting the experiments • EIS is a support activity • Participating to projects as well (LCG Task Forces) • The line between the two is more and more blurred • Natural evolution (coming closer to LHC start up and to analysis being part of the experiments’ activities) • Collaboration between EIS and ARDA members • Geographically, ARDA sits in bd 510 (next to main building), EIS in bd 28
2006 achievements • Substantial progress on the analysis system for all 4 experiments (includes GANGA for ATLAS and LHCb) • All the experiments have substantially progressed during 2006 (as demonstrated in the September LHC review). • The weak point is still the ATLAS involvement in GANGA, which was positively commented in the recent ATLAS Data Model review, but we do not see too many users yet. • Continuation of the integration and support with all 4 experiments • This activity is very successful and appreciated by the 4 experiments. The collaboration with the different experiments is different (it goes through the EGEE task forces in most of the cases), but the details of the operations are very different. In all cases the feedback from the experiments is very positive. • Dashboard • Originally started as a ARDA-CMS project, it is becoming our flagship project: ATLAS has joined (beginning of 2006). Recently LHCb and ALICE have asked to participate. In CMS and ATLAS it is part of the standard computing environment.
2006 achievements (highlights) • Several successful examples of common projects across LHC experiments • Experiment Dashboard (4 experiments) • GANGA (2 experiments) • WMS activities in CMS and ATLAS Task Forces (Task forces in Summer) • EGEE User Forum was a success • Collaboration on data management with other sciences/areas • AMGA as metadata catalogue for medical images and climatology (in HEP used by LHCb and GANGA); becoming part of gLite 3.1 • GANGA/DIANE as tool to drive productions on the grid • Successful ITU collaboration for the RRC conference • Other very good examples: UNOSAT, G4, BirdFlu, … all using technologies “recycled” from HEP
2007 goals (1) • Continue the experiment integration and support • It is the major focus of the EIS team • In the year of start up, these activities will be critical and appropriate effort has been allocated (it entails training effort for new fellows joining at the beginning of 2007) • Demonstrate Ganga • During the year, the activity has positively progressed. We feel nevertheless that this was OK on LHCb, but slower on ATLAS • One of the reasons (of the ATLAS slow start) are the problems on the data management side (not many data sets actually available in the required format) • The potential of the tool is high and we should use 2007 (first part of it) to come to a final decision on continued support based on experiments’ feedback • Note that already presently Ganga supports (for ATLAS and LHCb) simulation jobs, analysis jobs over a variety of back ends (LCG, DIRAC, Panda-OSG + “local” facilities like LSF)
2007 goals (2) • Streamline Ganga/Diane as a production tool • Very successful activity in supporting Geant4, HARP, ITU, BirdFlu, other NA4 activities etc… • Useful to attract new communities, ad-hoc productions etc… • But it does not scale! • Streamline the system First step Geant4 (Large-scale regression testing) • Extend the Experiment Dashboard • LHCb and ALICE just joined • 4-experiment common project • Interest in other communities/groups (NIKHEF: D0 experiment, Biomed) • IT/GD is preparing working groups on grid monitor • Job/FTS reliability studies based on dashboard technology • The extension of the Experiment Dashboard should pass through the abovementioned working group • ED has selected the person to participate into it • What we learnt so far is the correlation of information from different sources is the key, therefore even more collaboration is welcome • Existing examples: • Dashboard clients (Grid Reliability) provides breakdown of sites, grid errors etc… for the operation teams to act • A similar client does the same for FTS transfers • Collaboration with 3D and SAM to include some of their information in the dashboard
Staff issues • ED effort is entirely built up using: • LCG staff (matching funds) and EGEE staffs • Fellows (all on the end hitting the limit), PJAS, students and visitors • Only one staff is not on a LD contracts • Practically all LDs will come to an end in 2008 (2+2 year EGEE contracts) • In synch with very many persons in the department • Practical problems • Obvious… • Burocratic problems • What would happen to persons entering the 5th year of a LD with EGEE3 • After being assessed for long-term contract during their 3rd year of LD • What would happen to persons having just a 3-year contract not renewed • The reason being that the “temporary” source of money is not available any longer • Not having the “right” to be evaluated does not sound fair • Which is the absolute limit for an LD contract (cases of persons entering EGEE after a 3-year contract and potentially aiming to a 3+2+2+2 9-year contract • Questions • Can we have some flexibility on the staff-review? • Is there any leeway out there (in HR)?
Two-way dependencies • Inside the group • DB services: essential ingredients for dashboards • (XYZ) Common communication path to experiments • Inside IT • GD group • Operations, MW deployment testing • Data management • FIO • Service machines – purchase, OS&HW maintenance • Access to log information (RBs at CERN) • Backup (TSM) • DES • Development service • ORACLE commercial contact • OEM service for CERN DB instances • Outside IT • LHC experiments • NA4 • SFT • ORACLE • Training
SWOT analysis Strengths: • We built collaborations with the LHC experiment based on trust • Technically outstanding team • Agile structure, tools and know-how to help any application to start/improve in using the grid • Exposed to (often mastering) the latest technologies in use in the experiments • Privileged collaboration with Taipei, MonaLisa (Romania), Universities (UniGe, EPFL, Innsbruck…) Weaknesses: • Further improve on flexibility (across the two teams, across projects, across experiment) • Avoid to be locked to projects which are not mainstream • Not always obvious… see threats • Guarantee long-term support of developed products Opportunities: • Privileged point of view inside the experiments • Privileged position in EGEE and other grid initiatives Threats: • All staff is on limited-duration contracts!!! • Almost all staff approaching the 4-year term (at the same time) • Demonstrate a progressive refocusing on support issues • Ever-changing experiments goals/strategy