1 / 33

The new EGRID infrastructure

The new EGRID infrastructure. An update on the status of the EGRID project. The new EGRID infrastructure. The EGRID project: To implement Italian national grid facility for processing Economic and Financial data.

conley
Download Presentation

The new EGRID infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The new EGRID infrastructure An update on the status of the EGRID project

  2. The new EGRID infrastructure The EGRID project: • To implement Italian national grid facility for processing Economic and Financial data. • Underlying fabric on top of which partner projects develop Economic and Financial applications.

  3. The new EGRID infrastructure Summary: • Original user requirements • The first EGRID release • Operating problems • Redesigning EGRID • A web portal to access EGRID

  4. I. Original user requirements

  5. Original user requirements • HW infrastructure to store+manage 2TB Stock Exchange Data: NYSE, LSE, Borsa di Milano, etc. • Privacy: legally binding disclosure policies • Users do not have the same read rights: a research group has contract with NYSE for a specific company; another group has contract with LSE for all companies; etc. • Two classes of users: those that upload stock exchange raw data and that may remove it; and those that work on the data. • Facility organised for raw data pre-processing and end-user applications.

  6. II. The first EGRID release

  7. The first EGRID release Meeting the HW infrastructure requirement: • Bulk computing power access and bulk storage rented from INFN Padova, part of Physics grid! • Employed same EDG middleware INFN uses. • Two tiered topology dictated by network connectivity: • Partner projects have limited connectivity: installed peripheral sites supply local services. • Cache area for large data transfers. • Job execution points for non CPU intensive data processing. • INFN Padova has good connectivity: supplies services to whole community.

  8. The first EGRID release Meeting data privacy: EDG’s data access mechanism implied critical and fragile fine-tuning. • Classic SE: local files exposed through GridFTP. • GridFTP allows file manipulation compatible with underlying Unix filesystem permissions. • The underlying filesystem must be carefully managed: • Users mapped to specific local accounts: not pool accounts. • Users partitioned into especially created groups: reflects data access patterns. • Carefully crafted directory tree guides data access. • Users have same UID across all SEs. • Replication/Synchronisation of directory structure across all SEs. • Users supplied with tools to manage permissions coherently across all SEs.

  9. The first EGRID infrastructure Meeting pre-processing requirement: supported with tailor made wrapper component. • Developers can more easily grid enable pre-processing operations. • Users to more easily run grid pre-processing on given datasets. • Common Unix commands such as cat,cut and grep, were adapted to operate on grid stored files.

  10. The first EGRID infrastructure Meeting user needs: • User applications are specific to research interests: programmes and function libraries developed to aid porting of applications. • To facilitate installation of grid client SW, LiveCD technology was employed.

  11. III. Operating problems

  12. Operating problems HW infrastructure: • Only one large computing site: insufficient to demonstrate grid potential for distributed resource allocation. • Two tiered topology problematic: maintenance task on designated local user + EGRID could not dedicate enough manpower to job.

  13. Operating problems Privacy: • EDG and successor middleware LCG still lacked data access mechanism strong enough for EGRID. • Implemented solution is complex and does not scale: real account for each user in each SE, permissions on filesystem make tree replication tricky, etc… The middleware did not allow a solution in line with a pervasive grid view.

  14. Operating problems User needs: • Only small part of community used tailor made command line tools. • UI distributed on LiveCD spared users workstation reinstallation, but: • users complained of awkward usage • interference with usual way of working

  15. IV. Redesigning EGRID

  16. Redesigning EGRID • Driving factors: • Leaner and more general infrastructure • Robust privacy • Thoroughly re-examined grid usability

  17. Redesigning EGRID HW infrastructure: • Added second large computing centre: INFN Catania. • Dropped two tiered topology.

  18. Redesigning EGRID Privacy • Classic SE replaced with specific implementation of Storage Resource Manager (SRM) protocol currently being completed. • Implementation is result of StoRM collaboration with INFN-CNAF. • Not a proprietary solution – SRM becoming standard for grid disk access: security solution compatible with mainstream grid trends.

  19. Redesigning EGRID • How StoRM solves privacy: • All file requests are brokered with SRM protocol. • When StoRM receives an SRM request for a file: • StoRM asks policy source for access rights to: given file for given grid credentials. • Check is made at the grid credential level: not local user as before! • Physical enforcement through JustInTime ACL setup: • All files have no ACLs setup: no user can access files. • Local Unix account corresponding to grid credentials is determined. • ACL granting requested access set up for local user. • ACL removed when file no longer needed. • StoRM leverages grid’s LogicalFileCatalogue (LFC) as policy source: compatible with mainstream grid trends

  20. Redesigning EGRID • Completing data privacy: • ELFI tool developed to allow classic POSIX I/O software interface access to grid files. • ELFI is FUSE filesystem implementation: grid resources are seen through local mount points. • ELFI speaks SRM protocol: there is lack of SRM clients.

  21. Redesigning EGRID • ELFI allows more: • All existing file management tools work automatically with grid files: • Text tools: cat, grep, etc. • Graphical tools: Konqueror, etc. • Helps RAD/Prototyping: developers not got to learn new APIs when porting applications. • Sites supporting ELFI on WNs: applications spared need to explicitly run grid file transfer commands.

  22. Redesigning EGRID Grid usability: • Web portal key solution: portals long proved to be effective ways to allow user interaction with organisation’s information system. • Old command line tools will remain: • For backwards compatibility. • For few users that eagerly adopted them. • New development will concentrate on web portal.

  23. V. A web portal to access EGRID

  24. A web portal to access EGRID • Main entrance to new EGRID infrastructure. • All tools in one place + Graphical UI: • Closer to users’ way of working. • Lowers resistance to new technology. • No need to install grid SW on users’ workstation: • Interaction through portal as displayed in web browser. • P-grade chosen as portal technology: • Sufficiently sophisticated as starting point to meet EGRID requirements. • Does not fully meet EGRID requirements: extra development needed.

  25. A web portal to access EGRID P-grade’s GUI simplifies many routine task and masks complexity: • No need to manually handle job identification strings. • Display keeps track of launched jobs, status, allows output retrieval, job cancelling, etc. • Easily choose Broker for automatic job submission or specific CEs. • Enough flexibility to allow direct jdl attribute specification. • Graphical browsing of grid resources + file management: no need for distinct tools.

  26. A web portal to access EGRID P-grade portal adds new functionality: • Although MPI jobs can also be run from the CLI, P-grade supplies a special API that allows a graphical report on such jobs to be displayed. • Workflow manager: • Graphically specify several jobs. • Define connections among them showing data flow. • Portal takes care of retrieving job output and feeding it to linked jobs. • Monitoring of workflow done graphically showing data flow.

  27. A web portal to access EGRID Extra development needed: • Improved proxy management • SRM data management • SRM support in Workflow • Support for special workflow jobs: swarm jobs

  28. A web portal to access EGRID Improved proxy management: • P-grade first uploads user’s private key into host where Portal resides – then transfers it to MyProxy Server. • To lower security risks EGRID needs key to be transferred directly from user workstation to MyProxy server. • Java WebStart application developed by EGRID and seamlessly integrated into P-grade credentials portlet.

  29. A web portal to access EGRID SRM data management: • P-grade allows browsing of files: in classic SE + files local to user workstation. • P-grade: does not support SRM + does not support browsing of files in portal hosting machine. • ELFI allows access to StoRM through local mount point. • It is easier to write a portlet that allows browsing of portal local resources rather than one that deals with the new SRM protocol. • EGRID developed a new portlet to allow such browsing.

  30. A web portal to access EGRID SRM support in Workflow: • Workflow definition requires for each job to define input and output files. • For each file must be specified respective location. • P-grade supports: classic SEs + user workstation. • SRM is not supported. • New file location support in P-grade: host containing portal itself… StoRM will be accessed through ELFI local mount point! • On going collaboration with P-grade developers to better define requirement and study feasibility.

  31. A web portal to access EGRID Swarm Workflow jobs: • Swarm jobs: application run repeatedly on different datasets + final job collects results and carries out final aggregate computation. • Currently P-grade workflows allow only manual job parameter specification: automatic mechanism needed. • This feature is already present in P-grade’s release schedule.

  32. A web portal to EGRID Possible drawback: • Java technology is used extensively also on client side: Applets and JavaWebStart used for certain operations… users must have a Java Virtual Machine installed. • Given ubiquitous nature of Java… should not be a big problem.

  33. Acknowledgements • StoRM collaboration with INFN-CNAF of grid.IT project: Dr. Mirco Mazzuccato, Dr. Antonia Ghiselli. • P-grade team headed by Prof. Peter Kacsuk of MTA Sztaki Hungarian Academy Sciences • EGRID project leaders: Dr. Alvise Nobile of ICTP, Dr. Stefano Cozzini of INFM Democritos. • EGRID team: Alessio Terpin, Angelo Leto, Antonio Messina, Ezio Corso, Riccardo di Meo, Riccardo Murri.

More Related