190 likes | 204 Views
Metadata Organization and Management for Globalization of Data Access with. Micha ł Wrzeszcz, Krzysztof Trzepla, Rafa ł S ł ota, Konrad Zemek, Tomasz Licho ń , Łu kasz Opio ł a, Darin Nikolow, Ł ukasz Dutka, Renata S ł ota, Jacek Kitowski. ACC Cyfronet AGH
E N D
Metadata Organization and Management for Globalization of Data Access with MichałWrzeszcz, Krzysztof Trzepla, RafałSłota, Konrad Zemek, Tomasz Lichoń, Łukasz Opioła, Darin Nikolow, Łukasz Dutka, Renata Słota, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science,AGH - UST PPAM 2015 Krakow, Poland, September 6-9, 2015
Agenda • Motivation • Problems with Global Data Access • Is a new tool needed? • Onedata • Design Assumptions • Key Aspects of Data Access • Global data organization • Globally distributed metadata • Results • Conclusions
Motivation Scientific communities require global access that integrates independentlymanaged resources. Metadata organization and management is a key to make global accesseffective, simple and convenient.
Problems with Global Data Access • Storage heterogeneity and delays/bandwidth issue. • Manual transfer of data before/after computations. • No accounts integration: • Difficult access (security issues). • Problematic data sharing.
Is a new tool needed? Globus Connect iRODS Gluster GoogleDrive PanFS LFC Dropbox BeeFS Parrot
Onedata - Design Assumptions • All organizations (providers) supporting a user have access to all data and meta-data concerning the given user. • No central server for the metadata for the sake of performance and availability. • No replication everything to everyone, optimally managing the redundancy data. • Data access efficiency: • Minimal overhead when the data is close to client. • In the case of remote data an efficient fragment access.
Onedata - Key Aspects of Data Access • Global data organization • Hides complexity of data distribution from users • Indicates which remote data should be observed by each organization • Globally distributed metadata • No trust between providers • Caching vs. coherency
Global data organization Easy management and sharing of data for users. Limitation of metadata that provider should know.
Global metadata distribution • 3 metadata levels • Metadata used to coordinate providers’ cooperation • Files metadata stored by each provider • Current usage metadata • Usage optimization • Lower level -> more frequent usage -> higher distribution • Caching and aggregation of changes • Changes pushing to caches
Global metadata distribution Level 1 Supports cooperation (users accounts integration) Provides information which lower level metadata should be synchronized with whom (spaces metadata) Stored by Global Registry – distributed application which works as trusted mediator
Global metadata distribution Level 2 • Files metadata • File parts location description • Stored by each provider that supports particular space • Fast access to needed metadata • Limited number of synchronization operations • Propagation of changes on the basis of Level 1 metadata • Changes aggregation • Automatic conflicts resolution • Level 1 metadata caching
Global metadata distribution Level 3 • Metadata about current files usage • Who should be notified about file change • Where data is currently modified • Stored by providers, cached by clients • First aggregation at client side, second at provider’s • Updates Level 2 metadata
Global metadata distribution Sum up More changes -> lower level -> more power Global Registry Level 1 Provider 1 Provider 2 Level 1 Cache Level 1 Cache Level 3 Level 2 Level 2 Level 3 • Caching & aggregation vs. time needed to gain global consistency • Set balance at provider level (dynamic clients reconfiguration) • Locks for immediate consistency Client Level 3 Cache
Results Simplicity Easy organization of data Global distribution hidden Easy results publishing
Results Cooperation
Results Efficiency
Conclusions • Data organization allows hiding global distribution from users keeping providers’ independence • Ready for global users cooperation • Efficient enough for computations • Onedata status • Onedata v1 installed in production environment of ACC Cyfronet AGH • Onedata v2 currently tested by international organizations
Thank you onedata homepage: http://www.onedata.org