170 likes | 262 Views
Data and Storage Services. G. Cancio, D. Duellmann, J. Iven , M. Lamanna, A. Pace , A.J . Peters , R.Toebbicke. D.Duellmann from DM & SM TEG meeting 24.Feb.2012. D.Duellmann from DM & SM TEG meeting 24.Feb.2012. D.Duellmann from DM & SM TEG meeting 24.Feb.2012.
E N D
Data and Storage Services G. Cancio, D. Duellmann, J. Iven, M. Lamanna, A. Pace, A.J. Peters, R.Toebbicke
Storage, Services, Data management • Storage Services: • object store, key–value storage, cloud storage • Includes (arbitrary) reliability layer • Will not hide hardware performance, known latency, perfectly scalable Data management services Data and file access services Storage services
Storage, Services, Data management • Data and file access services • File access protocols, metadata management, can be mounted as a file system • Secure, User authentication, Access control, Accounting, Quotas • Interfaces to allow storage federation, redirection services, data placement Data management services Data and file access services Storage services
Storage, Services, Data management • Data Management services • Global namespace, Hierarchical storage, automatic data placement, SRM, pinning Data management services Data and file access services Storage services
Why multiple services ? • Managed storage – Castor • Low latency, hundreds of metadata ops/sec, thousands of simultaneous use, high performance – AFS • Is this enough ? Why more ?
File / Storage levels of service ? • Cost • Cost of hardware, energy, space • Cost of operation team • Reliability / availability • Reliability: probability to lose data • Availability: probability of not being able to temporarily access the data • Performance • Latency: Fixed delay per request • Throughput: Transfer speeds seen by the client • Scalability in term of size • Scalability in terms of number of users • Consistency: Different clients may see different data • Richness of service: Auth, Autz, Quota, Accounting, Client side caching • Namespace federation … Performance Scalability Throughput Cost Reliability Latency Consistency HW cost Ops Cost
“Agile” storage infrastructure • Storage services instantiated with custom levels of service: • Large pools with low latency, multiusers, for analysis • Highly available, highly reliable, multiuser pools, for condition data • Low cost pools for scratch data that can be lost as already replicated (low reliability) • Low cost pools, highly reliable, no multiuser, high latency, for archiving • Service instantiated from (uniform) sets of unreliable hardware • Resilient to hardware failures, human intervention limited to hardware resource provisioning, minimum cost of operation
Expected requirements in 2014+ • SRM, Hierarchical storage, pinning, automatic data placement • On demand storage, negotiable level of services • Global access to files and data with metadata management, authentication, access control, accounting, quota • Global access to scalable object store with no metadata management • Interfaces to federate storage from independent clusters ?
Which architecture ? • How much optimization is required ? versus Data management services Storage services Data and file access services Data management services Data and file access services Storage services Openstack Huawei, EOS Hadoop AFS, EOS, Castor Castor
Which architecture ? • If we want to combine Storage and File systems … versus versus Storage services Data and file access services Storage services Data and file access services Storage services Data and file access services S3 cloud service independent from AFS / EOS AFS / EOS services delivered from S3 cloud service S3 cloud service as an additional interface on top of existing AFS / EOS services Fully optimized Optimized for storage Optimized for file access
Facilities for storage services • Does storage require specific facilities or can it leverage from generic ones ? • On large facilities small systematic inefficiencies generate important costs • Many of the existing infrastructures provide classes of hardware to match service requirements (test needed, storage cannot move, relies on replicas Understand gains versus increased service complexity) Storage services Batch services Database services Facilities management services (common hardware)
Questions to answer: • What is the optimal architecture to deliver these services in the next years ? • What are the tests, and the research activities we are need to do to answer the challenge above ? • How do we get there ?