280 likes | 476 Views
Summary of the Persistency RTAG Report ( heavily based on David Malon’s slides ) - and some personal remarks. Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th 2002. SC2 mandate to the RTAG.
E N D
Summary of the Persistency RTAG Report(heavily based on David Malon’s slides)- and some personal remarks Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18th 2002
SC2 mandate to the RTAG • Write the product specification for the Persistency Framework for Physics Applications at LHC • Construct a component breakdown for the management of all types of LHC data • Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and as yet to be developed products • Develop requirements/use cases to specify (at least) the metadata /navigation component(s) • Estimate resources (manpower) needed to prototype missing components
Guidance from the SC2 • The RTAG may decide to address all types of data, or may decide to postpone some topics for other RTAGS, once the components have been identified. • The RTAG should develop a detailed description at least for the event data management. • Issues of schema evolution, dictionary construction and storage, object and data models should be addressed.
RTAG Composition • One member from each experiment, one from IT/DB, one from ROOT team: • Fons Rademakers (Alice) • David Malon (ATLAS) • Vincenzo Innocente (CMS) • Pere Mato (LHCb) • Dirk Düllmann (IT/DB) • Rene Brun (ROOT) Quoting Vincenzo’s report at CMS Week (6 March 02) “Collaborative, friendly atmosphere” “Real effort to define a common product” This is already an accomplishment.
Response of RTAG to mandate and guidance (excerpted from report) • Intent of this RTAG is to assume an optimistic posture regarding the potential for commonality among the LHC experiments in all areas related to data management • Limited time available to the RTAG precludes treatment of all components of a data management architecture at equal depth • will propose areas in which further work, and perhaps additional RTAGs, will be needed
Response of RTAG to mandate and guidance (excerpted from report) • Consonant with SC2 guidance, the RTAG has chosen to focus its initial discussions on the architecture of a persistence management service based upon a common streaming layer, and on the associated services needed to support it • Even if we cannot accomplish everything we aspire to, we want to ensure that we have provided a solid foundation for a near-term common project • While our aim is to define components and their interactions in terms of abstract interfaces that any implementation must respect, it is not our intention to produce a design that requires a clean-slate implementation
Response of RTAG to mandate and guidance (excerpted from report) • For the streaming layer and related services, we plan to provide a foundation for an initial common project that can be based upon the capabilities of existing implementations, and upon ROOT’s I/O capabilities in particular • While new capabilities required of an initial implementation should not be daunting, we do not wish at this point to underestimate the amount of repackaging and refactoring work required to support common project requirements
Status • Reasonable agreement on design criteria, e.g., • Component oriented, communication through abstract interfaces, no back channels, components make no assumptions about implementation technology of components with which they communicate • Persistence for C++ data models is the principal target, but our environments are already multilingual; should avoid constructions that make language migration and multi-language support difficult • Architecture should not preclude multiple persistence technologies • Experiments’ transient data models should not need compile-time/link-time dependencies on persistence technology in order to use persistence services
Status II • Reasonable agreement on design criteria, e.g., • Transient object types may have several persistent representations, the type of a transient object restored from a persistent one may be different than the type of the object that was saved, a persistent object cannot assume it “knows” what type of transient object will be built from it • …more… • Component discussions and requirement discussions have been uneven—extremely detailed and highly technical in some areas, with other areas neglected thus far for lack of time • Primary focus has been on issues and components involved in defining a common persistence service • Cache manager, persistence manager, storage manager, streamer service, placement service, dictionary service(s), … • Object identification, navigation, …
The proposed initial project • Charge to the initial project is to deliver the components of a common file-based streaming layer sufficient to support persistence for all four experiments’ event models, with management of the resulting files hosted in a relational layer • Elaboration of what this means appears in the report • Note that persistence service is intended to support all kinds of data—it is not specific to event data
Initial project components • The specification in the report describes agreed-upon common project components, including • persistence manager • placement service • streaming service • storage manager • externalizable technology-independent references • services to support event collections • connections to grid-provided replica management and LFN/PFN services
Initial project implementation technologies • Streaming layer should be implemented using the ROOT framework’s I/O services • Components with relational implementations should make no deep assumptions about the underlying technology • Nothing intentionally proposed that precludes implementation using such open source products as MySQL
Persistence Manager • Principal point of contact between experiment-specific frameworks and persistence services • Handles requests to store state of an object, returning a token (e.g., a persistent address) • Handles requests to retrieve state of an object corresponding to a token • Like Gaudi/Athena persistence service
Dictionary services • Manages descriptions of transient (persistence-capable) classes and their persistent representations • Interface to obtain reflective interface about classes • Entries in dictionary may be generated by disparate sources • Rootcint, ADL, LHCb XML, … • With automatic converter/streamer generation, likely that some persistent representations will be derivable from transient representation, but possibility of multiple persistent representations suggests separate dictionaries
Streaming or conversion service • Consults transient and persistent representation dictionaries to produce persistent (or “persistence-ready”) representation of a transient object, or vice versa
Placement service • Supports runtime control of physical placement, equivalent to “where” hints in ODMG • Intended to support physical clustering within in event, and to separate events written to different physics streams • Interface seen by experiments’ application frameworks is independent of persistence technology • Insufficiently specified in the RTAG report
References and reference services • Common class for encapsulating persistent addresses • Externalizable • Independent of storage technology, likely with an opaque payload that is technology-specific • In hybrid model, intent is that from a Ref, one can determine what “file” is needed without consulting the particular storage technology
Store manager • Stores and retrieves variable-length stream of bytes • Deals with issues at the file level
RefLFN service • Translates an Object Reference into the logical filename of the file containing the referenced objects • Expected that Ref will have some kind of file id, that can be used to determine logical file name in the grid sense
LFN{PFN}PosixName service • Translate a logical filename into the posix name of one physical replica of this file • Expect to get this from grid projects, though common project may need to deliver a service that hides several possible paths behind a single interface
Event collection services • Support for explicit event collections (not just collections by containment) • Support for collections of event references • Queryable collections: Like a list of events, together with queryable tags • Possibly indexed
MetaData Catalog CacheMgr SelectorSvc FileCatalog PlacementSvc StorageMgr DictionarySvc IteratorSvc CustomCacheMgr PersistencyMgr StreamerSvc StreamerSvc DictionarySvc StreamerSvc IMCatalog TGrid TChain TEventList TDSet IFCatalog TBuffer, TMessage, TRef, TKey IPers ICache IPers TTree One possible mapping to a ROOT implementation (under discussion) IReflection ICnv C++ TClass, etc. TStreamerInfo IPReflection IPlacement TFile, TDirectory TSocket IReadWrite TFile
Clarify Resources, Responsibilities & Risks • Expected resources at project start and their evolution • Commitments from experiments, the ROOT team and IT • Who does what (and until when)? • Who develops which software component(s)? • Who maintains those components afterwards? • Who develops production services around those? • What is the procedure for dropping any of those services? • Any “in house” development involves a significant maintenance commitment by somebody and risk for somebody else • Need to agree on these commitments
Components & Interfaces • Need to rapidly agree on concrete component interfaces • Could we classify/prioritize interfaces by risk? • eg by damage caused by a later interface change • At least the major (aka external) interfaces need the official blessing of the Architects’ Forum and can not be modified without it’s agreement • No component infrastructure defined so far • Component inheritance hierarchy, component factories, component mapping to shared libraries etc. • LCG approach needs to be “compatible” with • several LCG sub-projects • several different experiment frameworks • several existing HEP packages eg ROOT I/O • several RDBMS implementations • May need to assume some instability until solid foundation is accepted for LCG applications area
Components & Interface cont. • Fast adoption of existing “interface” classes may be tempting but is also very risky • Should not just bless existing header files which were conceived as implementation headers • Should take the opportunity to (re-) design a minimal, but complete set of abstract interfaces • And then implement them using existing technology
Timescales, Functionality & Technologies of an Initial Prototype • Experiments have somewhat differing timescales for their first use of LCG components • Synchronization of initial release schedule would definitely improve chances of success • Experiments may favor different subsets of full functionality for a first prototype • Need to agree on main requirements for prototype s/w and associated services to guide implementation and technology choices • Synchronization of feature content and implementation technology is required • Which RDBMS backend? What are the deployment requirement? • “lightweight” system (end-user managed) - maybe reduced requirements on scalability and fault tolerance and even on functionality • Fully managed production system - based on established database services (incl. backup, recovery from h/w fault …) • May need prototype implementation for both
Summary • Persistency RTAG has delivered its final report to the SC2 • Significant agreement on requirements and a component breakdown have been achieved • Report does not define all components and their interaction in full or equal depth • A common project on object persistency is proposed • Possible next steps • Clarify available resources and responsibilities • Agree on scope, timescale and deployment model of first project prototype • Rapidly agree on concrete set of component interfaces and spawn work packages to implement them • Continue to resolve remaining open questions