230 likes | 420 Views
Federating Grid and Cloud S torage in EUDAT. International Symposium on Grids and Clouds 2014, 23-28 March 2014. Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich , CERN. Agenda. Introduction … … … Test results Future work. Introduction.
E N D
Federating Grid and Cloud Storagein EUDAT International Symposium on Grids and Clouds 2014, 23-28 March 2014 Shaun de Witt, STFC Maciej Brzeźniak, PSNC Martin Hellmich, CERN
Agenda Introduction … … … Test results Future work 3rd EUDAT Technical meeting in Bologna 7th February 2013
Introduction • We present and analyze the results ofGridand CloudStorage integration • In EUDAT we used: • iRODS as GridStorage federationmechanism • OpenStack Swift as scalableobjectstoragesolution • Scope: • Proof of concept • Pilot OpenStack Swift installation in PSNC • ProductioniRODSserversin PSNC (Poznan) and EPCC (Edinburgh) 3rd EUDAT Technuical meeting in Bologna 7th February 2013
EUDAT projectintroduction • Partners:data center & communities: • pan-European Data Storage & mgmtinfrastructure • Long term data preservation: • Storage safety, availability – replication, integritycontrol • Data Accessibility – visibility, possibility to referoveryears 3rd EUDAT Technuical meeting in Bologna 7th February 2013
EUDAT challenges: Picture showingvariousstoragesystems federated underiRODS 3rd EUDAT Technuical meeting in Bologna 7th February 2013 • Federateheterogeneous data management systems: • dCache, AFS, DMF, GPFS, SAM-FS • File systems, HSMs, file servers • Object Storage systems(!) whileensuring: • Performance, scalability, • Data safety, durability, HA, fail-over • Uniqueaccess, Federationtransparency, • Flexibility (ruleengine) • Implement the core services: • safe and long-term storage: B2SAFE, • efficient analysis: B2STAGE, • easy deposit & sharing: B2SHARE, • Data & meta-data exploration: B2FIND.
Grid – Cloudstorageintegration • Need to integrateGrids and Cloud/Object storage • Gridsgetanother, cost-effective, scalablebackend • Many institutions and initiativesaretesting & using in productionobjectstorageincluding • Most Cloud Storage use Object Storage concept • Object Storage solutionshavelimitedsupportfor federationthatiswelladdressed in Grids • In EUDAT we integrated: • objectstoragesystem – OpenStack Swift • iRODSservers and federations 3rd EUDAT Technuical meeting in Bologna 7th February 2013
Context: Object Storage Concept • The conceptenablesbuildinglow-cost, scalable, efficientstorage: • Within data centre • DR / distributedconfigurations • Reliabilitythanks to redundancy of components: • Many cost-efficientstorageserversw/ diskdrives(12-60 HDD/SSD) • Typical (cheap) network: 1/10 GbitEthernet • Limitationsof traditionalappraoches: • High investmentcost and maintenance • Vendor lock-in, Closed architecture, Limitedscalability • Slow adoption of new technologiesthan in commodity market
Context: Object Storage importance • Many institutions and initiatives(DCs, NRENs, companies, R&D projects)aretesting & using in productionobjectstorageincluding: • Open source/ privatecloud: • Open Stack Swift • Ceph/ RADOS • Sheepdog, Scality… • Commercial: • Amazon S3, RackSpaceCloudFiles… • MS Azzure Object Storage… • Most promising open source: Open Stack Swift & Ceph
APP User Apps Client HOST / VM MDS OSDs MONs MDS.1 OSD.1 MON.1 Upload Download RBD ...... ...... RadosGW CephFS ...... Load balancer MDS.n MON.n OSD.n LibRados Proxy Node Proxy Node Proxy Node Rados Storage Node Storage Node Storage Node Storage Node Storage Node Object Storage: Architectures CEPH OpenStackSwift
Object Storage: concepts: OpenStackSwift Ring Ceph’s map • No meta-data lookups, no meta-data DB!, data placement/locationcomputed! • Swift: Ring:represents the space of all possible computed hash values divided in equivalent parts (partitions); partitionsarespreadacrossstoragenodes • Ceph: CRUSH map:list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data. Source: The Riak Project Source: http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Object Storage concepts: no DB lookups! OpenStackSwift Ring Ceph’s map • No meta-data lookups, no meta-data DB!, data placement/locationcomputed! • Swift: Ring:represents the space of all possible computed hash values divided in equivalent parts (partitions); partitionsarespreadacrossstoragenodes • Ceph: CRUSH map:list of storage devs, failure domain hierarchy (e.g., device, host, rack, row, room) and rules for traversing the hierarchy when storing data. Source: The Riak Project Source: http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/
Grid – Cloudstorageintegration • Most cloud/objectstoragesolutionsexpose: • S3 interface • Other native interfaces: OSS: Swift; Ceph: RADOS • S3 (by Amazon) is de facto standard in cloudstorage: • Many PetaBytes, Global systems • Vendorsuseit (e.g. Dropbox) orprovidesit • Largetakeup • Similarconcepts: • CDMI: Cloud Data Management Interface – SNIA standard, not manyimplementationshttp://www.snia.org/cdmi • Nimbus.IO: https://nimbus.io • MS-Azzureblob Storage:http://www.windowsazure.com/en-us/manage/services/storage/ • RackSpaceCloudFiles:www.rackspace.com/cloud/files/ 3rd EUDAT Technuical meeting in Bologna 7th February 2013
S3 and S3-like in commercialsystems: o • S3 re-sellers: • Lots of services • IncludingDropbox • Services similar to S3 concept: • Nimbus.IO: https://nimbus.io • MS-Azzureblob Storage:http://www.windowsazure.com/en-us/manage/services/storage/ • RackSpaceCloudFiles:www.rackspace.com/cloud/files/ • S3 implementations ‚in the hardware’: • Xyratex • Amplidata 3rd EUDAT Technuical meeting in Bologna 7th February 2013
Why build PRIVATE S3-likestorage? • Features/ benefits: • Reliable storage on top of commodityhardware • User stillable to control the data • Easy scalability, possible to grow the system • Addingresources and redistributing data possible in non-disruptiveway • Open source software solutions and standardsavailable: • e.g. OpenStack Swift: Open Stack Native API and S3 API • Other S3-enabled storage: e.g. RADOS • CDMI: Cloud Data Management Interface 3rd EUDAT Technuical meeting in Bologna 7th February 2013
Why to federate iRODS with S3/OpenStack? • We were asked toconsidercloudstorage: • From EUDAT 1st year review report: • Some communities have data stored in OpenStack • VPH isbuilding reliable storage cloud on top of OpenStack Swift within pMedicine project (together with PSNC) • These data should be available to EUDAT • Data Staging: Cloud -> EUDAT -> PRACE HPC and back • Data Replication: Cloud -> EUDAT -> other back-end storage • We couldapplyruleengine to data in the cloud, assignPIDs 3rd EUDAT Technuical meeting in Bologna 7th February 2013
VPH caseanalysis: iRODSclient Data ingestion Dataingestion Dataaccess Dataaccess Data access EUDAT’siRODSfederation Replication Data Staging iRODSserver iRODSserver iRODSserver storage driver otherstorage driver S3 driver PIDassigned Regi-stration HPC system Storage system Data ingestion S3/OSSclient OSS API S3 API EUDAT’sPID Service Data access
Our 7.2 project • Purpose: • To examineexisting iRODS-S3 driver • (possibly) to improveit / provideanother one • Steps/status: • 1st stage: • Play with whatisthere – done for OpenStack/S3 + iRODS • Examinefunctionality • Evaluatescalability – foundsomeissuesalready • Follow-up • Try to improve the existing S3 driver • Functionality • Performance • Implement native Open Stack driver? • Get in touch with iRODSdevelopers 3rd EUDAT Technuical meeting in Bologna 7th February 2013
iRODS-OpenStack tests iRODS server(s) S3/OpenStack API S3 API TEST SETUP: • iRODS server: • Cloud as compoundresources • Disk cachein front ofit • Open Stack Swift: • 3 proxies, 1 with S3 • 5 storage nodes • Extensivefunctionality and perf. tests • Amazon S3: • Onlylimitedfunctionalitytests 3rd EUDAT Technuical meeting in Bologna 7th February 2013
iRODS-OpenStack test TEST RESULTS: • S3 vs native OSS overhead • Upload: ~0% • Download: ~8% • iRODS overhead: • Upload: ~19% • Download: • From compound S3: ~0% • Cached: SPEEDUP: 230% (cache resources faster than S3)
Conclusions and future plans: • Conclusions • Performance-wise iRODS does not bring much overhead – files <2GB • Problems arise for files >2GB – no support for multipart upload in iRODS-S3 driver – thispreventsiRODS from storingfiles >2GB in clouds • Some functional limits (e.g. imv problem) • Using iRODS to federate S3 clouds in large scalewould require improving the existing or developing a new driver • Future plans: • Test the integration with VPH’scloudusingexisting driver • Ask SAF for supporting the driver development • Get in touch with iRODSdevelopers to assure the sustainability of ourwork
Object storage on top of iRODS? • Problems: • Data organisationmapping: • * filesystem vs objects • * big files vs fragments • Identity mapping? • * S3 keys/accounts vs X.509? • Out of scope of EUDAT? • * a lot of workneeded iRODSclient Dataingestion Dataaccess S3 API iRODS API EUDAT’siRODSfederation iRODSserver iRODSserver otherstorage driver S3 driver Otherstorage S3/OSSclient S3 API Storage system Storage system Data Access/ingest