1 / 18

Metadata Organization and Management for Globalization of Data Access with

Metadata Organization and Management for Globalization of Data Access with. Micha ł Wrzeszcz, Krzysztof Trzepla, Rafa ł S ł ota, Konrad Zemek, Tomasz Licho ń , Łu kasz Opio ł a, Darin Nikolow, Ł ukasz Dutka, Renata S ł ota, Jacek Kitowski. ACC Cyfronet AGH

donovank
Download Presentation

Metadata Organization and Management for Globalization of Data Access with

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Organization and Management for Globalization of Data Access with MichałWrzeszcz, Krzysztof Trzepla, RafałSłota, Konrad Zemek, Tomasz Lichoń, Łukasz Opioła, Darin Nikolow, Łukasz Dutka, Renata Słota, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science,AGH - UST PPAM 2015 Krakow, Poland, September 6-9, 2015

  2. Agenda • Motivation • Problems with Global Data Access • Is a new tool needed? • Onedata • Design Assumptions • Key Aspects of Data Access • Global data organization • Globally distributed metadata • Results • Conclusions

  3. Motivation Scientific communities require global access that integrates independentlymanaged resources. Metadata organization and management is a key to make global accesseffective, simple and convenient.

  4. Problems with Global Data Access • Storage heterogeneity and delays/bandwidth issue. • Manual transfer of data before/after computations. • No accounts integration: • Difficult access (security issues). • Problematic data sharing.

  5. Is a new tool needed? Globus Connect iRODS Gluster GoogleDrive PanFS LFC Dropbox BeeFS Parrot

  6. Onedata - Design Assumptions • All organizations (providers) supporting a user have access to all data and meta-data concerning the given user. • No central server for the metadata for the sake of performance and availability. • No replication everything to everyone, optimally managing the redundancy data. • Data access efficiency: • Minimal overhead when the data is close to client. • In the case of remote data an efficient fragment access.

  7. Onedata - Key Aspects of Data Access • Global data organization • Hides complexity of data distribution from users • Indicates which remote data should be observed by each organization • Globally distributed metadata • No trust between providers • Caching vs. coherency

  8. Global data organization Easy management and sharing of data for users. Limitation of metadata that provider should know.

  9. Global metadata distribution • 3 metadata levels • Metadata used to coordinate providers’ cooperation • Files metadata stored by each provider • Current usage metadata • Usage optimization • Lower level -> more frequent usage -> higher distribution • Caching and aggregation of changes • Changes pushing to caches

  10. Global metadata distribution Level 1 Supports cooperation (users accounts integration) Provides information which lower level metadata should be synchronized with whom (spaces metadata) Stored by Global Registry – distributed application which works as trusted mediator

  11. Global metadata distribution Level 2 • Files metadata • File parts location description • Stored by each provider that supports particular space • Fast access to needed metadata • Limited number of synchronization operations • Propagation of changes on the basis of Level 1 metadata • Changes aggregation • Automatic conflicts resolution • Level 1 metadata caching

  12. Global metadata distribution Level 3 • Metadata about current files usage • Who should be notified about file change • Where data is currently modified • Stored by providers, cached by clients • First aggregation at client side, second at provider’s • Updates Level 2 metadata

  13. Global metadata distribution Sum up More changes -> lower level -> more power Global Registry Level 1 Provider 1 Provider 2 Level 1 Cache Level 1 Cache Level 3 Level 2 Level 2 Level 3 • Caching & aggregation vs. time needed to gain global consistency • Set balance at provider level (dynamic clients reconfiguration) • Locks for immediate consistency Client Level 3 Cache

  14. Results Simplicity Easy organization of data Global distribution hidden Easy results publishing

  15. Results Cooperation

  16. Results Efficiency

  17. Conclusions • Data organization allows hiding global distribution from users keeping providers’ independence • Ready for global users cooperation • Efficient enough for computations • Onedata status • Onedata v1 installed in production environment of ACC Cyfronet AGH • Onedata v2 currently tested by international organizations

  18. Thank you onedata homepage: http://www.onedata.org

More Related