200 likes | 210 Views
Data Management on the GRID. Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management. Personal Information. PhD in theoretical physics (Lattice QCD) at U of Bern
E N D
Data Management on the GRID Peter Z. Kunszt CERN Database Group EU DataGrid – Data Management LCGP 13.3.2002
Personal Information • PhD in theoretical physics (Lattice QCD) at U of Bern • ‘Builder of the SDSS Project’ – design and implementation work on the SDSS science archive SX both Objectivity and MS SQLServer • CERN Database Group • Activity task leader for Grid Data Management • Management of WP2 (Data Management) of the EDG Project LCGP 13.3.2002
Data Transfer Transport protocols Data Access Remote I/O Security / Policies Data Storage Hierarchical Storage Mass Storage Replication Peer-to-Peer Centralized Distributed Automatic Metadata management Scalable Distributed Consistent Persistency Grid-enabled databases and data stores Independent of back-end implementation Optimisation Data Access optimisation Cost minimsation Scope of Data Management LCGP 13.3.2002
Vision of Grid Data Management • Distributed Shared Data Storage • Ubiquitous Data Access • Transparent Data Transfer and Migration • Consistency and Robustness • Optimisation LCGP 13.3.2002
Vision of Grid Data Management • Distributed Shared Data Storage • Different architectures • Heterogenous data stores • Self-describing data and metadata GRID LCGP 13.3.2002
Vision of Grid Data Management • Ubiquitous Data Access • Global Namespace • Transparent security control and enforcement • Access from anytime anywhere, physical data location irrelevant • Automatic Data Replication and Validation GRID LCGP 13.3.2002
Vision of Grid Data Management • Transparent Data Transfer and Migration • Protocol negotiation and multiple protocol support • Management of data formats and database versions GRID LCGP 13.3.2002
Vision of Grid Data Management • Consistency and Robustness • Replicated data is reasonably up-to-date • Reliable data transfer • Self-detecting and self-correcting mechanisms upon data corruption GRID X LCGP 13.3.2002
? ? ? Vision of Grid Data Management • Optimisation • Customisation or self-adaptation to specific access patterns • Distributed Querying, Data Analysis and Data Mining GRID ! LCGP 13.3.2002
Grid Data Management Dependencies Media Hardware Operating System Local File System Network Software Protocols Storage System Performance Reliability Availability Usability LCGP 13.3.2002
Globus GridFTP Replica Catalog Replica Manager EU DataGrid GDMP Replica Catalog Replica Manager Spitfire Condor NeST PPDG Magda JASMine GDMP Griphyn/iVDGL Virtual Data Toolkit Storage Resource Broker Storage Resource Manager ROOT Alien Nimrod-G Existing Middleware for Grid Data Management - Overview Not exhaustive LCGP 13.3.2002
Globus Data Management • GridFTP • Fast, parallel file transfer • Towards self-optimising system • Work on reliable file transfer on top • Replica Catalog – jointly with EDG WP2 • Configurable • Distributed, hierarchical • Scalable • Replica Manager • Security infrastructure LCGP 13.3.2002
European DataGrid WP2 • GDMP – with PPDG • In production with CMS forObjectivity replication • Subscription-based replication • Scalable architecture • Replica Catalog with Globus • Replica Manager and Optimiser • Take Globus RM as core • Additional modules for pre- postprocessing of data • Replica Selection in the WP2 Optimisation task • Simulator to test replica selection • Spitfire • Unified front-end to databases • Suitable for Grid and Application Metadata LCGP 13.3.2002
WP2 Replica Manager Architecture Replica Catalogue Metadata Catalogue Core API Optimisation API LCGP 13.3.2002
Condor Data Management • Condor Matchmaking • Find optimal resource • Condor Network Storage (NeST) • Generic access to storage – abstract storage interface • Virtual Protocol Layer • User Management and Reservation • Chirp • Minimum set of file access requests • Meta-management requests • Condor Bypass LCGP 13.3.2002
PPDG / Griphyn Data Management • Globus, Condor, SRB • GDMP – with EDG • Magda • To be used in ATLAS data challenges • Metadata catalog • JASMine JLAB Asynchronous Storage Manager • Storage Management and Resource • Replica catalog based on MySQL, as Web Service • Replication service • File Server • Griphyn Virtual Data System LCGP 13.3.2002
SRB, SRM • SDSC Storage Resource Broker • Advanced resource techniques • Replica Catalog based on Oracle, catalog itself is being replicated using Oracle’s replication mechanism • Storage Resource Manager (LBNL) • Interfaces to any Storage System • Joint functional definition with EDG, PPDG, Griphyn LCGP 13.3.2002
P2P technology Gnutella Napster Freenet Oceanstore CHORD CAN JXTA Search Mojo Nation Database technology Replication Distributed heterogeneous databases Query planning and optimization Storage Unitree DMF HPSS Castor, Enstore, Eurostore SAM File Systems AFS, Coda, Intermezzo NFS GPFS, CXFS, GFS, DFS, DAFS SlashGrid Reference Technologies LCGP 13.3.2002
Application to LCG Project • Bridge the gap between immediate needs of experiments for production quality grid middleware and existing prototype middleware • Evolve existing grid middleware into production quality services • LCG Project is a Deployment Grid – nevertheless we will need to do some development • Specialization of existing Grid Middlewareto the LHC environment – explicitly to the tiered architecture model • Very close relations to Application Area Physics Data Management task AFS GDM LCGP 13.3.2002
Issues / Dangers • Commonalities – solving the same problems again and again ; potential for duplication of effort • Think in Virtual Organisations • RTAGs, like Common Persistency Framework • Security – i can see what you can’t see • EDG Security Group – see Dave Kelsey’s talk • SciDAC • Building Trust relationships • Standardisation – bringing it all together and agree, agree, agree • OGSA • GGF • Consensus – too many cooks spoil the broth • Making decisions in time • Keeping agreements, sticking to standards • Avoid Micromanagement LCGP 13.3.2002