1 / 20

Data Management on the GRID

Data Management on the GRID. Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management. Personal Information. PhD in theoretical physics (Lattice QCD) at U of Bern

monzon
Download Presentation

Data Management on the GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management LCGP 13.3.2002

  2. Personal Information • PhD in theoretical physics (Lattice QCD) at U of Bern • ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer • CERN Database Group • Activity task leader for Grid Data Management • Management of WP2 (Data Management) of the EDG Project LCGP 13.3.2002

  3. Data Transfer Transport protocols Data Access Remote I/O Security / Policies Data Storage Hierarchical Storage Mass Storage Replication Peer-to-Peer Centralized Distributed Automatic Metadata management Scalable Distributed Consistent Persistency Grid-enabled databases and data stores Independent of back-end implementation Optimisation Data Access optimisation Cost minimsation Scope of Data Management LCGP 13.3.2002

  4. Vision of Grid Data Management • Distributed Shared Data Storage • Ubiquitous Data Access • Transparent Data Transfer and Migration • Consistency and Robustness • Optimisation LCGP 13.3.2002

  5. Vision of Grid Data Management • Distributed Shared Data Storage • Different architectures • Heterogenous data stores • Self-describing data and metadata GRID LCGP 13.3.2002

  6. Vision of Grid Data Management • Ubiquitous Data Access • Global Namespace • Transparent security control and enforcement • Access from anytime anywhere, physical data location irrelevant • Automatic Data Replication and Validation GRID LCGP 13.3.2002

  7. Vision of Grid Data Management • Transparent Data Transfer and Migration • Protocol negotiation and multiple protocol support • Management of data formats and database versions GRID LCGP 13.3.2002

  8.   Vision of Grid Data Management • Consistency and Robustness • Replicated data is reasonably up-to-date • Reliable data transfer • Self-detecting and self-correcting mechanisms upon data corruption GRID  X  LCGP 13.3.2002

  9. ? ? ? Vision of Grid Data Management • Optimisation • Customisation or self-adaptation to specific access patterns • Distributed Querying, Data Analysis and Data Mining GRID ! LCGP 13.3.2002

  10. Grid Data Management Dependencies Media Hardware Operating System Local File System Network Software Protocols Storage System Performance Reliability Availability Usability LCGP 13.3.2002

  11. Globus GridFTP Replica Catalog Replica Manager EU DataGrid GDMP Replica Catalog Replica Manager Spitfire Condor NeST PPDG Magda JASMine GDMP Griphyn/iVDGL Virtual Data Toolkit Storage Resource Broker Storage Resource Manager ROOT Alien Nimrod-G Existing Middleware for Grid Data Management - Overview Not exhaustive LCGP 13.3.2002

  12. Globus Data Management • GridFTP • Fast, parallel file transfer • Towards self-optimising system • Work on reliable file transfer on top • Replica Catalog – jointly with EDG WP2 • Configurable • Distributed, hierarchical • Scalable • Replica Manager • Security infrastructure LCGP 13.3.2002

  13. European DataGrid WP2 • GDMP – with PPDG • In production with CMS forObjectivity replication • Subscription-based replication • Scalable architecture • Replica Catalog with Globus • Replica Manager and Optimiser • Take Globus RM as core • Additional modules for pre- postprocessing of data • Replica Selection in the WP2 Optimisation task • Simulator to test replica selection • Spitfire • Unified front-end to databases • Suitable for Grid and Application Metadata LCGP 13.3.2002

  14. WP2 Replica Manager Architecture Replica Catalogue Metadata Catalogue Core API Optimisation API LCGP 13.3.2002

  15. Condor Data Management • Condor Matchmaking • Find optimal resource • Condor Network Storage (NeST) • Generic access to storage – abstract storage interface • Virtual Protocol Layer • User Management and Reservation • Chirp • Minimum set of file access requests • Meta-management requests • Condor Bypass LCGP 13.3.2002

  16. PPDG / Griphyn Data Management • Globus, Condor, SRB • GDMP – with EDG • Magda • To be used in ATLAS data challenges • Metadata catalog • JASMine JLAB Asynchronous Storage Manager • Storage Management and Resource • Replica catalog based on MySQL, as Web Service • Replication service • File Server • Griphyn Virtual Data System LCGP 13.3.2002

  17. SRB, SRM • SDSC Storage Resource Broker • Advanced resource techniques • Replica Catalog based on Oracle, catalog itself is being replicated using Oracle’s replication mechanism • Storage Resource Manager (LBNL) • Interfaces to any Storage System • Joint functional definition with EDG, PPDG, Griphyn LCGP 13.3.2002

  18. P2P technology Gnutella Napster Freenet Oceanstore CHORD CAN JXTA Search Mojo Nation Database technology Replication Distributed heterogeneous databases Query planning and optimization Storage Unitree DMF HPSS Castor, Enstore, Eurostore SAM File Systems AFS, Coda, Intermezzo NFS GPFS, CXFS, GFS, DFS, DAFS SlashGrid Reference Technologies LCGP 13.3.2002

  19. Application to LCG Project • Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware • Evolve existing grid middleware into production quality services • LCG Project is a Deployment Grid – nevertheless we will need to do some development • Specialization of existing Grid Middlewareto the LHC environment – explicitly to the tiered architecture model • Very close relations to Application Area Physics Data Management task AFS GDM LCGP 13.3.2002

  20. Issues / Dangers • Commonalities – solving the same problems again and again ; potential for duplication of effort • Think in Virtual Organisations • RTAGs, like Common Persistency Framework • Security – i can see what you can’t see • EDG Security Group – see Dave Kelsey’s talk • SciDAC • Building Trust relationships • Standardisation – bringing it all together and agree, agree, agree • OGSA • GGF • Consensus – too many cooks spoil the broth • Making decisions in time • Keeping agreements, sticking to standards • Avoid Micromanagement LCGP 13.3.2002

More Related