170 likes | 191 Views
Preparing to Change the Baseline for CMS Persistency. Vincenzo Innocente Workshop CMS-Italia Computing & Software Roma, Nov 23 2001. Baselining Architecture/Framework/Toolkits. Current schedule is to Baseline CMS “Offline” Software in time for the Physics TDR.
E N D
Preparing to Change the Baseline for CMS Persistency Vincenzo Innocente Workshop CMS-Italia Computing & Software Roma, Nov 23 2001 VI/ Rome Nov 23
Baselining Architecture/Framework/Toolkits Current schedule is to Baseline CMS “Offline” Software in time for the Physics TDR. Major activity in next 12/18 month will be to define and prototype the initial production software for LHC operation • Review Architecture • Choose products • Prototype and implement middleware • Implement framework and toolkits A primary goal is to ensure that the architecture will support and take profit of the evolution of IT technology at negligible cost for CMS physics software • Some components harder to change: • Programming language • Data Management layer Object Store plays a central role in CMS computing model: • Persistency cannot be regarded as just other basic computing service • We must ensure to be able to access data for the whole lifetime of the experiment (even longer) VI/ Rome Nov 23
Data Browser Software Development File File DistributedData Store Visualization File File Analysis job wizards Coherent Analysis Environment Network Services Visualization Tools Reconstruction Simulation Batch Services Analysis Tools Persistency Services VI/ Rome Nov 23
CMS Data Analysis Model Quasi-online Reconstruction Environmental data Detector Control Online Monitoring store Request part of event Store rec-Obj Request part of event Event Filter Object Formatter Request part of event store Persistent Object Store Manager Database Management System Store rec-Obj and calibrations Physics Paper store Request part of event Data Quality Calibrations Group Analysis Simulation VI/ Rome Nov 23 User Analysis on demand
ODBMS C++ standard library Extension toolkit Geant3/4 CLHEP Paw Replacement Analysis & Reconstruction Framework Physics modules Specific Framework Event Filter Reconstruction Algorithms Physics Analysis Data Monitoring Generic Application Framework Calibration Objects Configuration Objects Event Objects adapters and extensions Utility Toolkit VI/ Rome Nov 23
User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event DataSet DataSet Meta-Data Electrons Event HEP Data • Event-Collection Meta-Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments (luminosity, selection criteria, …) • … • Event Data, User Data Event Collection Collection Meta-Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms VI/ Rome Nov 23
ApplicationRepresentation Persistent DataRepresentation Database internal Representation NETWORK Database Storage (Server+Files) Tertiary Storage(Tapes) DataBase Management System Application (Distributed) DBMS Client DBMS Server GRID Distributed, Hierarchical,File Storage System VI/ Rome Nov 23
Objectivity • Objectivity • Currently about 30TB in Objectivity DB’s • Experience with writing into DB with up to 300 CPU’s in parallel • Little experience to date with large numbers of parallel readers • We have confidence that we could make an Objectivity based solution work • Commercial Considerations • Object databases have not taken off as forecast • Objectivity is the only major vendor of an ODBMS • CMS, envisages the possibility to continue using the product for maybe 1 year if the company should disappear (such that all support disappeared) • Baseline Software • Milestone at end of 2002 to go into Physics and CCS TDR Process • Make changes to baseline before PTDR, not during or after (If we can avoid it) VI/ Rome Nov 23
Oracle Assessment • Latest Oracle version (9i) implements the key features of an Object Relational Database • Ability to store instances of user defined classes as objects • C++ and java interface as well as SQL • Resilient server architectures, interesting new developments on cluster architectures • Interesting development plans for total storage system management • But, currently heavy size (time?) overheads to store our sort of data • Laborious, multi-step, process for specifying to the DB our complex objects • CMS has submitted (to IT/DB) a list of approximately 50 areas of concern with the intention that we can rapidly determine if there are any “show-stoppers” before investing major effort • One full time CMS CERN-Fellow) is working with the IT/DB Oracle team to build CMS expertise on the possible ways to store our sort of data. • We can now store and retreive ORCA SimHits in Oracle • IT/DB plans to report back in January on the “Show-stoppers” • Probably not a solution we can adopt in next year or more. • Interesting R&D, but probably not somewhere CMS can afford to spend its limited manpower VI/ Rome Nov 23
Data Access Problems • All Productions (at CERN) and Analysis (everywhere) have encountered debilitating data access problems • Disk failures. Lack of reliable hardware puts production and analysis in conflict • Castor+RFIO immaturity • Network limitations • Late delivery of hardware • At CERN Many problems, some of which we associate with an inadequate manpower situation • Net effect of these leads to job failure rates in the >15% range. • Wrong LSF parameters, daemons killing servers, signals trapped by LSF wrappers etc, complex systems with non-understood interconnections • These problems are features of the large amount of data we have now, small amount of disk, long CPU times. • Same problems for Objectivity or non-Objectivity • But the tools we have in Objectivity to relocate files, optimize their serving, compensate for hardware failure have been very useful and give guidance to how our systems should be setup in the future Data Access has had serious impact at CERN, FNAL, DESY this year. Cheapest service is not working anywhere for our sort of challenges VI/ Rome Nov 23
CMS-ROOTIO Workshop (Oct 10/11) • ROOT team has built a powerful system for object streaming and object description in the IO files. We want both these types of feature • Users have built, and now ROOT is implementing, inter-file references (OIDS!) • See http://www.AmbySoft.com/mappingObjects.pdf for example, strong emphasis on having control of your “oids” • Do not allow a proprietary layer to do this ! • Much easier to change persistency later • Need a true DB layer at least for the OID mapping, collections etc • We also need the functionalities as currently supplied by the Objectivity AMS/RRP etc to delegate “files” and “file responsibilities” to a very low level of the system • GRID should concentrate the mind on this issue as the whole concept of physical and logical files requires addressing • With these three layers one could build a sustainable persistency solution that could satisfy most reasonable use-cases for LHC VI/ Rome Nov 23
ROOTIO/CMS Mismatches ? • CMS has some important technical issues to resolve with the ROOT team • Degree of intrusiveness • Resolvable. • Technical c++ issues • Global-state, threads, exception handling.. • Use of external software • Rather than subsumption of it • zlib example • Definitions of modularity have to be agreed • Long term scalability and maintenance is at issue here • Development model • Management, oversight, SW packaging, priorities, "ownership", and so on • Legacy support • May be an issue by 2005,2010 • These issues should be tractable VI/ Rome Nov 23
Key issues Object Model vs Data Model • Schema, dictionary (how is generated, where and how is stored) • streamers, converters (generated, user-provided) Object identification (OID, URL,…) • How and object will be identified in the CMS “universal” storage system? • From a transient application • By internal navigation in the storage itself • How replication and re-clustering (reorganization of the physical data storage) will be supported? Storage System Administration • OS vs “DBA” Management and future developments • Product ownership • Architecture, Modularity and Packaging • Collaboration with other experiments • (undesirable) Backward compatibility VI/ Rome Nov 23
Plan Approved in Joint Technical Board • Maintain all required support for DAQTDR • But wherever possible avoid new developments • Now to Xmas: • Estimate program of work to reach a new baseline for 2003 • ROOTIO based object storage for event and possibly other (eg calibration objects?) • Oracle or similar DB and OID-mapping layer • Low-level file handling tools for data management • Try to establish common projects with LHC community • All of these will be common requirements for LHC experiments • New Year: Evaluate progress • Estimate impact on post DAQ-TDR milestones • Establishing a working group now • Vital to ensure this is a team effort, both within CMS and in LHC • Maximize intellectual contributions, avoid jamboree • Concentrated timescale, aim for participation from CMS(C,P&T)+ROOT+IT/DB • Get main players involved together from the start VI/ Rome Nov 23
A HEPIO Standard? • The physical and logical format of the object store should (?) be an LHC (or even HEP?) specification. • ROOTIO format may be a concrete area we can start on now • Document and specify ROOTIO physical and logical format • Identify missing concepts and iterate with ROOT team, • For example the OID issues, long and short references • Intention is to keep the ROOTIO solutions, possibly with some extensions • Establish this format and an agreed mechanism for changing it • We must know that we can write/read/interpret this data and schema “forever” • With this standard agreed, • We establish an important component of ROOT as a guaranteed layer • We start to collaborate in a concrete technical way • ROOT itself of course interfaces to this format now, so is in a strong position • But, one can imagine building now, or in the future, other products • ROOT++, and/or something quite different • Need to identify, now, someone in CMS willing to do much of this work in collaboration with ROOT team VI/ Rome Nov 23
Why now? • The LHC community is ready to find common solutions • Prototyping is over, need real solutions within finite resources • We need to effectively use external intellectual effort • Alleviate our manpower problems • There are many technical issues to resolve that we have specific requirements on. • Act now and get most of what we want, or wait and get what we get • We have the most advanced experience in LHC on real production issues • Put that at the disposal of the other groups • We have to go into the Physics TDR with a baseline that we expect to last • And that will probably take a years work for a small team VI/ Rome Nov 23
Current Activities Root Prototype • Prototype using as much as possible from Root without modifying current architecture • SimEvent (Tracks, Vertices, Hits) • Crossing Building (including 1034 pileup) • Transient digitization • Technical (informal) forum (LHC experiments, Root, IT/DB) • Confront current architectures (and implementations) • Discover possible common components • Discuss “missing” functionalities in ROOT • Discuss alternatives • SC2 TAG • CMS will actively promote a common project on a HEP solution for a Data(Base) Management System VI/ Rome Nov 23