100 likes | 113 Views
File sharing requirements of the physics community. Background General requirements Visitors Laptops Software development and physics analysis Web services Concerns and questions Summary. Background. IT divisional planning report (CERN-IT-DLO-2000-005)
E N D
File sharing requirements of the physics community • Background • General requirements • Visitors • Laptops • Software development and physics analysis • Web services • Concerns and questions • Summary Marco Cattaneo -EP Forum - 11th June 2001
Background • IT divisional planning report (CERN-IT-DLO-2000-005) • “A three person team, composed of representatives from the physics user community, PDP and IS groups should produce a concise document consolidating the existing user requirements for Data Sharing by April 2001” • Triggered by unclear future of AFS • MC, Bernd Panzer-Steindel, Alberto Pace • Draft URD written by M.Cattaneo • With input from a few users in a number of experiments and roles • Atlas, CMS, LHCb, (Aleph), IT/CO • Physicists, Librarians, Software developers, external users • Basis for discussion in IT (C5), EP (today), DTF, FOCUS • Not official position of anybody! • Probably too CERN-centric! • Not addressing sharing of bulk physics data and “group n-tuples” Marco Cattaneo -EP Forum - 11th June 2001
General requirements • Transparent file access from any node on site • Interactive and batch, Windows and Unix • Same files and directory structure, same access rights, no ‘stale’ files • “Native” access to files • Use native OS commands, native access to files from applications • Customisable protections • ACLs for individuals and groups • Modifiable by users, via scriptable tools, at file or directory granularity • Groups can be “corporate”, user defined, overlapping • Authentication • Single site-wide login, remote command execution • Transmitted to batch and scheduled jobs • Fast and reliable • << 1 sec for file retrieval and directory browsing, >> 99% up time • Regular backups (including open files), restore in few minutes • Possibility to execute unique site-wide login script • With user, group, experiment customisation • Source control and versioning • Not only for code but also, e.g. user guides, papers, minutes Marco Cattaneo -EP Forum - 11th June 2001
Visitors(see also G.Bagliesi’s slides) • Efficient read/write access to personal files • Home institute files from CERN, CERN files from home institute • Ideally, the same physical home directory (or mirror) • Not yet possible due to network response • Transparent read/write access to remote directory acceptable alternative • Maximum once/day authentication at remote site • Technology chosen at CERN should be: • Installable at home labs (if they choose to do so!) • At reasonable cost • Allow simple porting of scripts and other tools to home-lab file system • Allow simple mirroring of selected directories to home-lab file system • Ideally part of a single HEP-wide solution Marco Cattaneo -EP Forum - 11th June 2001
Laptops • Possibility to install authentication and file-sharing software on laptop • Access to CERN files when on CERN intranet • Access to home-lab files when on home-lab intranet • without major reconfiguration of laptop • Automatic synchronisation of selectable sets of files upon connection to CERN intranet • Allow laptop user to continue working seamlessly in CERN-like environment when not connected. Marco Cattaneo -EP Forum - 11th June 2001
Software development (including for physics analysis)(See also Maya’s talk) • Requirements for code repository • Experiment wide (or even world) read access from anywhere in the world • Write access rights based on individual and/or group identification • Requirements for software build and release • Highly efficient, native, source code access by build process • Not necessarily shared • But must be possible to install binaries on shared file system • Automatic submission of builds on other platforms • And other sites? • Access to same files via multiple paths • aka Unix soft links • Requirements for release areas • Site-wide, native, access to source code and binaries • Compile, link against released software, load shareable libraries at run-time • Site-wide access to both general purpose and experiment specific software • On all interactive and batch nodes • World-wide access to official release areas • Via world-wide access to CERN file system • Especially for “nightly builds” or rapidly evolving software • Via automatic mirroring, or distribution kits • World wide access to automatic code documentation Marco Cattaneo -EP Forum - 11th June 2001
Web access • Possibility to make any file accessible from web, by simple manipulation • Possibility to make any directory accessible from the web • With or without directory browsing • Possibility to restrict access • With similar range of protection categories as for home directories • Ideally with same authentication mechanism • Possibility to have multiple authors • Possibility to maintain web sites remotely • i.e. from outside CERN for site hosted at CERN • Possibility to edit sites from any platform • i.e. from both Windows and Linux • Especially important for sites with multiple authors Marco Cattaneo -EP Forum - 11th June 2001
Concerns and questions(following C5) • Requirements validation • How representative are these requirements? • Are the EP forum DTF and FOCUS enough? • What about other large sites (e.g. FNAL) • With their own agendas, with their own experiments • How much of requested functionality is essential • And how much is just due to habit? • Security • Open access to files (e.g. Web) can lead to attacks (e.g. spam) • Access control requires CERN registration • Reasonable for write access (e.g. from grey book data) • Problem for world read access to e.g. published data, code • Inter-site replication and consistency • Read only copies, or also modify? • Granularity: record or file or directory level synchronisation? • Latency: what triggers syncronisation? • E.g. CVS commit, AFS cache • Need to understand this requirement better! • Windows/Unix interoperability • Do we need full functionality of AFS client? Marco Cattaneo -EP Forum - 11th June 2001
Will the GRID help? • Don’t wait for the GRID! • Should not expect the GRID to solve all our problems • Should split our requirements into different areas • With the possibility of different solutions • CVS servers, web services, home directories, release areas, etc. • OpenAFS • Gaining acceptance. • Should we invest manpower? • Weak spots • Web authentication, laptop synchronisation • LDAP for common authentication • Highly political • Requires a “central” registry Marco Cattaneo -EP Forum - 11th June 2001
Summary • File sharing requirements of the physics community • Site wide, cross platform (Unix/Windows) data sharing • Application transparent file access, including file manipulation with native commands • Access control from all platforms, with authentication based on groups as well as individuals. • Access from outside CERN • Document aggregation (“directories”) to allow sharing of a large number of (small) files as a single object. • Fast and reliable file access • Source control and automated versioning • Many open questions • IT have asked for a four page summary • Keep the status quo until “the GRID” arrives? Marco Cattaneo -EP Forum - 11th June 2001