450 likes | 563 Views
The EGRID technical journey. Building the Italian national pilot Grid facility for research in Economics and Finance. The EGRID technical journey. I part: context Technical goals Strategic choices Operational goals II part: Initial September 2004 release
E N D
The EGRID technical journey Building the Italian national pilot Grid facility for research in Economics and Finance
The EGRID technical journey • I part: context • Technical goals • Strategic choices • Operational goals • II part: Initial September 2004 release • Natural solution within EGEE Infrastructure and why not feasible • Initial EGRID Solution within EGEE Infrastructure and its shortcomings • III part: current EGRID release • Preconditions • StoRM • ELFI • Portal • Current EGRID Solution within EGEE Infrastructure • IV part: further remarks on EGRID Portal
Technical goals Ministerial requests: • To be the IT fabric on top of which partner projects carry out Economic and Financial research. • Constraint: • do not buy new hardware. • do not develop/research new grid middleware. • Not a research project on Grid: working infrastructure must be supplied to researchers! • Pilot facility: proposed technical solution realistically generalisable. • Co-ordinate and resolve technical needs of researchers.
Technical goals Researchers’ requests: • Manage ~2TB Stock Exchange Data (LSE, NYSE, Milano, Madrid, etc.) for researchers to study. • Grid solution must enforce strict Stock Exchange data disclosure policies. • Supply grid training. • Ensure means for lowering resistance to adoption of new technology.
Strategic choices Fact: • High Energy Physics community in Italy and in Europe has in place big scientific grid infrastructure: EGEE. • Serves computing needs of CERN’s Large Hadron Collider LHC experiments soon to begin operation. Choice: • Rent computing + storage resources from Italy’s INFN (part of EGEE) • Use same MW: originally called EDG, now called gLite. • Scientific grid: our researchers compete for resource with other scientific communities.
Strategic choices Consequence: • Given Ubiquity of EGEE Infrastructure, if EGRID successful, the facility can expand beyond Italy. • Suggests a litmus test for verifying if the Pilot solution can be generalized: “Whatever technical arrangement we come up with, can it be used around the European scientific grid?”
Strategic choices Focus on: • Tactical question the project would answer: “To what extent is EGEE Infrastructure, built around HEP community needs for CERN’s LHC experiments, suited to the needs of our sample of Economic and Finance researchers?” • All technical answers EGRID would give to our researchers would be in the context of the EGEE Infrastructure.
Operational goal Satisfy within EGEE Infrastructure Stock Exchange requirement: • Research groups have private deals with Stock Exchanges; those outside deal cannot access data. • Researchers within different groups freely form temporary partnerships that share some pre-processed data; those outside partnership cannot access data. • Researchers need a personal private area to store in-progress work; everybody else cannot access data.
The natural solution The natural solution within EGEE Infrastructure: • Each user identified by Digital Certificate. • Each group of people is a virtual organization VO. • Different VOs do not see each other files. • To belong to different VOs, different Digital Certificate must be used. • Being a member of different VOs guarantees data separation.
The natural solution Our Economic/Finance researchers would needed to: • Have 1 certificate to access StockExchange contracted data. • Have 1 certificate for each temporary group they work in. • Have 1 certificate for their private data. • Switch certificates when one data set is needed. • No easy way to work on different data sets at the same time.
The natural solution Unfeasible to our researchers: • Too many certificates to handle. • Impractical and limited switching. • Lengthy bureaucratic procedure to release new certificates. • Slow procedure to have grid sites accept new VOs. Creation of our research groups more dynamic and volatile than that of HEP.
The initial EGRID solution Two technical observations: • Grid user identified by certificate, but to use specific data resource user mapped to arbitrary local account: Pool Account Mechanism. • To enforce access control to data resources: Group Membership Mechanism.
The initial EGRID solution Solution: • Disabled pool accounts and enabled less used configuration direct account mapping • Grid user better identified within data resource • Pushed group membership fine tuning to extreme: • Use of carefully crafted directory tree: allowed impose controlled access to files.
The initial EGRID solution Result: • 1 Digital Certificate per researcher: only 1 VO for all. • Each group of researchers had exclusive access to contracted Stock Exchange data. • Each researcher had a private area. • A further elaboration of the solution + LDAP technology + scripting, allowed partial freedom in creating directories with access restricted to specific research groups.
The initial EGRID solution Litmus test for Pilot Facility: cannot be generalized • Single SE heavily micro managed: not guaranteed to be possible in all SE. • If multiple SE involved: difficult synchronization/replication of directory tree and group permissions.
Consequence on initial grid topology Padova Simplification needed: • Technical difficulty replication/synchronisation • Limited bandwidth at Palermo CE SE 2.6 TB WNs 100 CPUs Firenze site RB (Padova) CE+SE+WN Trieste Palermo . . . . CE+SE+WN CE+SE+WN
Consequence on initial grid topology Logical names introduced: • To allow researchers address a file using same name, regardless of physical location. • To mask as much as possible the complexity of the security mechanism. • Achieved through RLS grid service: • Catalogue of correspondence between one logical name and several physical names. • Flat namespace. • Synchronisation of files not enforced. • Client level: • 1 logical name : 1 physical name • Simulated directory tree
Consequence on initial grid topology Litmus test for Pilot Facility: cannot be generalised • Rigid topology: can’t see rest of grid.
Preconditions Single biggest obstacle for Pilot Facility: • Data Security in the sense of flexible and efficient control over access to files • Problematic in itself. • Severely limited grid topology.
Preconditions Technical improvements of MW: • EDG evolved into LCG and then gLite. • Modern OS introduced. • Installation procedure fully revisited and documented. • VOMS technology introduced • Digital Certificated enriched with extra information: all VO membership, roles, groups. • But group membership still main idea for enforcing mechanism. • RLS substituted with LFC • Catalogue was hierarchical + ACL support on logical names. • Still no synchronization of copies + ACLs not at physical level: no real control on file access. • SRM protocol becoming the standard for file access to grid files.
Preconditions INFN-CNAF help request: • StoRM SRM server was at conceptual/prototype stage. • Needed help with data access functions. • Welcome opportunity to make low level MW intervention.
StoRM The solution: • Originally conceived by Riccardo Murri, EGRID Team. • Use of Just In Time physical ACL set up mechanism. • Use of LFC’s logical grid ACLs. • Result: with StoRM there are effective physical ACLs for grid users belonging to the same VO.
StoRM StoRM’s main feature: to leverage existing GPFS installations. • For fast POSIX access to LHC data sets present locally. • For native space reservation support: vital during WAN transfers of large LHC data sets. But GPFS also supports natively POSIX ACLS.
StoRM Technically: • JiT allow usual pool account mechanism to be used: • StoRM screens specific account within the pool. • StoRM sets a temporary ACL for that account. • JiT gets rid of group membership + directory structure as enforcing mechanism.
StoRM Architecturally: • Within a StoRM server – users of same VO are guaranteed data security through ACLs. • Having different StoRM servers share the same LFC • All installations implicitly and automatically have the same directory structure and permissions! • Any change in LFC, is automatically picked up by all StoRMs! • Users continue to work exclusively with logical names + operations in LFC catalogue.
StoRM Litmus test of Pilot Facility: • LFC ACLs fully accommodate needs of our researchers • Stock Exchange, private area, dynamic volatile research group formation. • Two-tiered topology disappears • All of EGEE Infrastructure sites can be used. • INFN-CNAF is important partner in EGEE: StoRM official MW
ELFI User Interaction: • 3 services involved. • 3 clients needed. • Unfeasible to our researchers. ? LFC Storm LFC Transfer Storm ELFI ELFI clients developed: Transfer
ELFI Technically: “ELFI is Linux filesystem interface to EGEE Logical Filename Catalog (LFC) and Storage Elements (SE).” • Characterising property: shows the user a POSIX interface by means of FUSE Linux technology. • Users and applications see files in grid storage through local mount points as if they are locally present.
Architecturally: Centred around logical file names. Supplies directory tree view: all SEs and LFC are at root. Files are displayed with logical name, not the physical one. All present under LFC root. Also present under SE, if contained in that SE. ELFI forbids copying file from an SE to homologous position in other SE: avoids replicas since LFC does not synchronise them. ELFI
ELFI ELFI is remarkable: • It is a POSIX interface to grid data. • All non-grid applications that access files, can be run on the grid without specific porting. • It is important for legacy applications.
ELFI Litmus test for Pilot Facility: • Requires FUSE that soon will be part of grid’s OS. • ELFI under consideration as standard grid software. • But: • Subset of core ELFI functionality turned into EGRID Clients. • EGRID Clients can be installed in grid sites through standard Experiment Software procedure. • Users keep Logical File Name centric view and transparent LFC+StoRM+Transfer functionality.
Portal Grid usability: • EGRID considered important to have a Portal • Portals proved to be effective ways to allow user interact with organisation’s information system. • It would be main entrance to new EGRID infrastructure: graphical, closer to user way of working, all grid tools in one place, accessed through web browser. • PGRADE technology chosen: sufficiently sophisticated as starting point, but required improvements.
Portal EGRID improvements Digital Certificate Management improvement: • PGRADE first uploads digital certificate to portal host, then interacts with MyProxy grid service, before allowing grid access. • It is risky to have the Digital Certificate around: Java WebStart application developed by EGRID to interact directly from the researcher workstation with MyProxy server.
Portal EGRID improvements SRM data management addition: • Lacks SRM and LFC support: only classic SE and files local to workstation. • ELFI allows access to StoRM and the logical file name centric view. • ELFI was installed in the Portal host and a portlet was developed to allow browsing of Portal host local mount points. • ELFI behinds the scenes takes care of all data grid protocols involved.
The current EGRID solution The solution in place: • EGRID Portal as main entrance to Pilot Facility. • Portal host with ELFI and Data Management Portlet • StoRM in INFN Padova and soon in production in INFN Catania • Users can see full EGEE Infrastructure: Italy is accessible + Europe on the way (finalising contract). • INFN Padova volunteered to install full ELFI in their production computing nodes. • EGRID Clients installed in all other grid nodes as official Experiment Software.
In one picture: The current EGRID solution
Further remarks on EGRID portal Job submission in the portal: • By drawing graphs representing workflows • Each node is a job executed in a single computing node. • A handful of nodes represents only that handful of computing nodes.
Further remarks on EGRID portal EGEE Infrastructure best suited to LHC calculations: • Same code executed on different chunks of data. • Thousands of normal jobs needed to process big data set. • Portal’s graphical formalism implies drawing thousands of nodes in order to fully leverage EGEE Infrastructure computing potential. Impractical! • PGRADE addressing the issue with Parametric Jobs, but still immature feature.
Further remarks on EGRID portal Use the portal for: • Data management. • Experimental and exploratory phase of research. • Learning grid. Use grid’s CLI to fully leverage the computing potential of EGEE Infrastructure.
Further remarks on EGRID portal General observation: • Portal’s job submission general purpose: web interface to CLI grid commands. • Real difference if web is interface for specific programmes’ input parameters. • This requires applications that are frequent tools of analysis for researchers. • Our research community: tailor codes application for specific interest; then tailor codes new application for new aspect; and so on. • This limited somewhat the added value of the portal. • In general it is best to have framework of portal grid components that can be reused to tailor make web interfaces for specific researcher needs.
Acknowledgements • StoRM collaboration with INFN-CNAF of grid.IT project: Dr. Mirco Mazzuccato, Dr. Antonia Ghiselli. • P-grade team headed by Prof. Peter Kacsuk of MTA Sztaki Hungarian Academy Sciences • EGRID project leaders: Dr. Alvise Nobile of ICTP, Dr. Stefano Cozzini of INFM Democritos. • Past memebers of the EGRID Team: Angelo Leto, Cristian Zoicas • EGRID team: Alessio Terpin, Antonio Messina, Ezio Corso, Massimo Sponza, Riccardo di Meo, Riccardo Murri.